[tip: core/rcu] rcu: Enable rcu_normal_after_boot unconditionally for RT

2021-02-15 Thread tip-bot2 for Julia Cartwright
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 36221e109eb20ac111bc3bf3e8d5639aa457c7e0
Gitweb:
https://git.kernel.org/tip/36221e109eb20ac111bc3bf3e8d5639aa457c7e0
Author:Julia Cartwright 
AuthorDate:Tue, 15 Dec 2020 15:16:47 +01:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 04 Jan 2021 13:43:51 -08:00

rcu: Enable rcu_normal_after_boot unconditionally for RT

Expedited RCU grace periods send IPIs to all non-idle CPUs, and thus can
disrupt time-critical code in real-time applications.  However, there
is a portion of boot-time processing (presumably before any real-time
applications have started) where expedited RCU grace periods are the only
option.  And so it is that experience with the -rt patchset indicates that
PREEMPT_RT systems should always set the rcupdate.rcu_normal_after_boot
kernel boot parameter.

This commit therefore makes the post-boot application environment safe
for real-time applications by making PREEMPT_RT systems disable the
rcupdate.rcu_normal_after_boot kernel boot parameter and acting as
if this parameter had been set.  This means that post-boot calls to
synchronize_rcu_expedited() will be treated as if they were instead
calls to synchronize_rcu(), thus preventing the IPIs, and thus avoiding
disrupting real-time applications.

Suggested-by: Luiz Capitulino 
Acked-by: Paul E. McKenney 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
[ paulmck: Update kernel-parameters.txt accordingly. ]
Signed-off-by: Paul E. McKenney 
---
 Documentation/admin-guide/kernel-parameters.txt | 7 +++
 kernel/rcu/update.c | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 521255f..e0008d9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4474,6 +4474,13 @@
only normal grace-period primitives.  No effect
on CONFIG_TINY_RCU kernels.
 
+   But note that CONFIG_PREEMPT_RT=y kernels enables
+   this kernel boot parameter, forcibly setting
+   it to the value one, that is, converting any
+   post-boot attempt at an expedited RCU grace
+   period to instead use normal non-expedited
+   grace-period processing.
+
rcupdate.rcu_task_ipi_delay= [KNL]
Set time in jiffies during which RCU tasks will
avoid sending IPIs, starting with the beginning
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 39334d2..b95ae86 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -56,8 +56,10 @@
 #ifndef CONFIG_TINY_RCU
 module_param(rcu_expedited, int, 0);
 module_param(rcu_normal, int, 0);
-static int rcu_normal_after_boot;
+static int rcu_normal_after_boot = IS_ENABLED(CONFIG_PREEMPT_RT);
+#ifndef CONFIG_PREEMPT_RT
 module_param(rcu_normal_after_boot, int, 0);
+#endif
 #endif /* #ifndef CONFIG_TINY_RCU */
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC


[tip: locking/core] squashfs: Make use of local lock in multi_cpu decompressor

2020-06-01 Thread tip-bot2 for Julia Cartwright
The following commit has been merged into the locking/core branch of tip:

Commit-ID: fd56200a16c72c7c3ec3e54e06160dfaa5b8dee8
Gitweb:
https://git.kernel.org/tip/fd56200a16c72c7c3ec3e54e06160dfaa5b8dee8
Author:Julia Cartwright 
AuthorDate:Wed, 27 May 2020 22:11:16 +02:00
Committer: Ingo Molnar 
CommitterDate: Thu, 28 May 2020 10:31:10 +02:00

squashfs: Make use of local lock in multi_cpu decompressor

The squashfs multi CPU decompressor makes use of get_cpu_ptr() to
acquire a pointer to per-CPU data. get_cpu_ptr() implicitly disables
preemption which serializes the access to the per-CPU data.

But decompression can take quite some time depending on the size. The
observed preempt disabled times in real world scenarios went up to 8ms,
causing massive wakeup latencies. This happens on all CPUs as the
decompression is fully parallelized.

Replace the implicit preemption control with an explicit local lock.
This allows RT kernels to substitute it with a real per CPU lock, which
serializes the access but keeps the code section preemptible. On non RT
kernels this maps to preempt_disable() as before, i.e. no functional
change.

[ bigeasy: Use local_lock(), patch description]

Reported-by: Alexander Stein 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Ingo Molnar 
Tested-by: Alexander Stein 
Acked-by: Peter Zijlstra 
Link: https://lore.kernel.org/r/20200527201119.1692513-5-bige...@linutronix.de
---
 fs/squashfs/decompressor_multi_percpu.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c 
b/fs/squashfs/decompressor_multi_percpu.c
index 2a2a2d1..e206ebf 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -20,7 +21,8 @@
  */
 
 struct squashfs_stream {
-   void*stream;
+   void*stream;
+   local_lock_tlock;
 };
 
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
@@ -41,6 +43,7 @@ void *squashfs_decompressor_create(struct squashfs_sb_info 
*msblk,
err = PTR_ERR(stream->stream);
goto out;
}
+   local_lock_init(&stream->lock);
}
 
kfree(comp_opts);
@@ -75,12 +78,16 @@ void squashfs_decompressor_destroy(struct squashfs_sb_info 
*msblk)
 int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head 
**bh,
int b, int offset, int length, struct squashfs_page_actor *output)
 {
-   struct squashfs_stream __percpu *percpu =
-   (struct squashfs_stream __percpu *) msblk->stream;
-   struct squashfs_stream *stream = get_cpu_ptr(percpu);
-   int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
-   put_cpu_ptr(stream);
+   struct squashfs_stream *stream;
+   int res;
+
+   local_lock(&msblk->stream->lock);
+   stream = this_cpu_ptr(msblk->stream);
+
+   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+   offset, length, output);
+
+   local_unlock(&msblk->stream->lock);
 
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",


Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

2019-07-26 Thread Julia Cartwright
On Fri, Jul 26, 2019 at 08:30:58PM +0200, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior 
> 
> On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
> hard interrupt expiry mode are moved into soft interrupt context either for
> latency reasons or because the hrtimer callback takes regular spinlocks or
> invokes other functions which are not suitable for hard interrupt context
> on PREEMPT_RT.
> 
> The hrtimer_sleeper callback is RT compatible in hard interrupt context,
> but there is a latency concern: Untrusted userspace can spawn many threads
> which arm timers for the same expiry time on the same CPU. On expiry that
> causes a latency spike due to the wakeup of a gazillion threads.
> 
> OTOH, priviledged real-time user space applications rely on the low latency
> of hard interrupt wakeups. These syscall related wakeups are all based on
> hrtimer sleepers.
> 
> If the current task is in a real-time scheduling class, mark the mode for
> hard interrupt expiry.
> 
> [ tglx: Split out of a larger combo patch. Added changelog ]
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> Signed-off-by: Thomas Gleixner 
> ---
>  kernel/time/hrtimer.c |   24 
>  1 file changed, 24 insertions(+)
> 
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1662,6 +1662,30 @@ static enum hrtimer_restart hrtimer_wake
>  static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
>  clockid_t clock_id, enum hrtimer_mode mode)
>  {
> + /*
> +  * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
> +  * marked for hard interrupt expiry mode are moved into soft
> +  * interrupt context either for latency reasons or because the
> +  * hrtimer callback takes regular spinlocks or invokes other
> +  * functions which are not suitable for hard interrupt context on
> +  * PREEMPT_RT.
> +  *
> +  * The hrtimer_sleeper callback is RT compatible in hard interrupt
> +  * context, but there is a latency concern: Untrusted userspace can
> +  * spawn many threads which arm timers for the same expiry time on
> +  * the same CPU. That causes a latency spike due to the wakeup of
> +  * a gazillion threads.
> +  *
> +  * OTOH, priviledged real-time user space applications rely on the
> +  * low latency of hard interrupt wakeups. If the current task is in
> +  * a real-time scheduling class, mark the mode for hard interrupt
> +  * expiry.
> +  */
> + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> + if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> + mode |= HRTIMER_MODE_HARD;

Because this ends up sampling the tasks' scheduling parameters only at
the time of enqueue, it doesn't take into consideration whether or not
the task maybe holding a PI lock and later be boosted if contended by an
RT thread.

Am I correct in assuming there is an induced inversion here in this
case, because the deferred wakeup mechanism isn't part of the PI chain?

If so, is this just to be an accepted limitation at this point?  Is the
intent to argue this away as bad RT application design? :)

   Julia


Re: [patch V2 1/1] Kconfig: Introduce CONFIG_PREEMPT_RT

2019-07-18 Thread Julia Cartwright
On Wed, Jul 17, 2019 at 10:01:49PM +0200, Thomas Gleixner wrote:
> Add a new entry to the preemption menu which enables the real-time support
> for the kernel. The choice is only enabled when an architecture supports
> it.
> 
> It selects PREEMPT as the RT features depend on it. To achieve that the
> existing PREEMPT choice is renamed to PREEMPT_LL which select PREEMPT as
> well.
> 
> No functional change.
> 
> Signed-off-by: Thomas Gleixner 
> Acked-by: Paul E. McKenney 
> Acked-by: Steven Rostedt (VMware) 
> Acked-by: Clark Williams 
> Acked-by: Daniel Bristot de Oliveira 
> Acked-by: Frederic Weisbecker 
> Acked-by: Ingo Molnar 
> Acked-by: Peter Zijlstra (Intel) 
> Acked-by: Marc Zyngier 
> Acked-by: Daniel Wagner 
> ---

I'm excited to see where this goes.

Acked-by: Julia Cartwright 

  Julia


[tip:sched/core] kthread: Convert worker lock to raw spinlock

2019-02-28 Thread tip-bot for Julia Cartwright
Commit-ID:  fe99a4f4d6022ec92f9b52a5528cb9b77513e7d1
Gitweb: https://git.kernel.org/tip/fe99a4f4d6022ec92f9b52a5528cb9b77513e7d1
Author: Julia Cartwright 
AuthorDate: Tue, 12 Feb 2019 17:25:53 +0100
Committer:  Thomas Gleixner 
CommitDate: Thu, 28 Feb 2019 11:18:38 +0100

kthread: Convert worker lock to raw spinlock

In order to enable the queuing of kthread work items from hardirq context
even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a
raw_spin_lock.

This is only acceptable to do because the work performed under the lock is
well-bounded and minimal.

Reported-by: Steffen Trumtrar 
Reported-by: Tim Sander 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Tested-by: Steffen Trumtrar 
Reviewed-by: Petr Mladek 
Cc: Guenter Roeck 
Link: https://lkml.kernel.org/r/20190212162554.19779-1-bige...@linutronix.de

---
 include/linux/kthread.h |  4 ++--
 kernel/kthread.c| 42 +-
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index c1961761311d..6b8c064f0cbc 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,7 +85,7 @@ enum {
 
 struct kthread_worker {
unsigned intflags;
-   spinlock_t  lock;
+   raw_spinlock_t  lock;
struct list_headwork_list;
struct list_headdelayed_work_list;
struct task_struct  *task;
@@ -106,7 +106,7 @@ struct kthread_delayed_work {
 };
 
 #define KTHREAD_WORKER_INIT(worker){   \
-   .lock = __SPIN_LOCK_UNLOCKED((worker).lock),\
+   .lock = __RAW_SPIN_LOCK_UNLOCKED((worker).lock),\
.work_list = LIST_HEAD_INIT((worker).work_list),\
.delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\
}
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 087d18d771b5..5641b55783a6 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -599,7 +599,7 @@ void __kthread_init_worker(struct kthread_worker *worker,
struct lock_class_key *key)
 {
memset(worker, 0, sizeof(struct kthread_worker));
-   spin_lock_init(&worker->lock);
+   raw_spin_lock_init(&worker->lock);
lockdep_set_class_and_name(&worker->lock, key, name);
INIT_LIST_HEAD(&worker->work_list);
INIT_LIST_HEAD(&worker->delayed_work_list);
@@ -641,21 +641,21 @@ repeat:
 
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
-   spin_lock_irq(&worker->lock);
+   raw_spin_lock_irq(&worker->lock);
worker->task = NULL;
-   spin_unlock_irq(&worker->lock);
+   raw_spin_unlock_irq(&worker->lock);
return 0;
}
 
work = NULL;
-   spin_lock_irq(&worker->lock);
+   raw_spin_lock_irq(&worker->lock);
if (!list_empty(&worker->work_list)) {
work = list_first_entry(&worker->work_list,
struct kthread_work, node);
list_del_init(&work->node);
}
worker->current_work = work;
-   spin_unlock_irq(&worker->lock);
+   raw_spin_unlock_irq(&worker->lock);
 
if (work) {
__set_current_state(TASK_RUNNING);
@@ -812,12 +812,12 @@ bool kthread_queue_work(struct kthread_worker *worker,
bool ret = false;
unsigned long flags;
 
-   spin_lock_irqsave(&worker->lock, flags);
+   raw_spin_lock_irqsave(&worker->lock, flags);
if (!queuing_blocked(worker, work)) {
kthread_insert_work(worker, work, &worker->work_list);
ret = true;
}
-   spin_unlock_irqrestore(&worker->lock, flags);
+   raw_spin_unlock_irqrestore(&worker->lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_work);
@@ -843,7 +843,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
if (WARN_ON_ONCE(!worker))
return;
 
-   spin_lock(&worker->lock);
+   raw_spin_lock(&worker->lock);
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);
 
@@ -852,7 +852,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
list_del_init(&work->node);
kthread_insert_work(worker, work, &worker->work_list);
 
-   spin_unlock(&worker->lock);
+   raw_spin_unlock(&worker->lock);
 }
 EXPORT_SYMBOL(kthread_delayed_work_timer_fn);
 
@@ -908,14 +908,14 @@ bool kthread_queue_delayed_work(struct kthread_worker 
*worker,
 

Re: [PATCH] iommu/dmar: fix buffer overflow during PCI bus notification

2019-02-20 Thread Julia Cartwright
On Wed, Feb 20, 2019 at 10:46:31AM -0600, Julia Cartwright wrote:
> Commit 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI
> device path") changed the type of the path data, however, the change in
> path type was not reflected in size calculations.  Update to use the
> correct type and prevent a buffer overflow.
> 
> This bug manifests in systems with deep PCI hierarchies, and can lead to
> an overflow of the static allocated buffer (dmar_pci_notify_info_buf),
> or can lead to overflow of slab-allocated data.
> 
>BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0
>Write of size 1 at addr 90445d80 by task swapper/0/1
>CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW   
> 4.14.87-rt49-02406-gd0a0e96 #1
>Call Trace:
> ? dump_stack+0x46/0x59
> ? print_address_description+0x1df/0x290
[..]
>The buggy address belongs to the variable:
> dmar_pci_notify_info_buf+0x40/0x60
> 
> Fixes: 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device 
> path")
> Signed-off-by: Julia Cartwright 
> ---
>  drivers/iommu/dmar.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 6b7df25e1488..9c49300e9fb7 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -145,7 +145,7 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned 
> long event)
>   for (tmp = dev; tmp; tmp = tmp->bus->self)
>   level++;
>  
> - size = sizeof(*info) + level * sizeof(struct acpi_dmar_pci_path);
> + size = sizeof(*info) + level * sizeof(info->path[0]);

This is probably a candidate for struct_size() instead, if that's what
is preferred.

   Julia


[PATCH] iommu/dmar: fix buffer overflow during PCI bus notification

2019-02-20 Thread Julia Cartwright
Commit 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI
device path") changed the type of the path data, however, the change in
path type was not reflected in size calculations.  Update to use the
correct type and prevent a buffer overflow.

This bug manifests in systems with deep PCI hierarchies, and can lead to
an overflow of the static allocated buffer (dmar_pci_notify_info_buf),
or can lead to overflow of slab-allocated data.

   BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0
   Write of size 1 at addr 90445d80 by task swapper/0/1
   CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW   
4.14.87-rt49-02406-gd0a0e96 #1
   Call Trace:
? dump_stack+0x46/0x59
? print_address_description+0x1df/0x290
? dmar_alloc_pci_notify_info+0x1d5/0x2e0
? kasan_report+0x256/0x340
? dmar_alloc_pci_notify_info+0x1d5/0x2e0
? e820__memblock_setup+0xb0/0xb0
? dmar_dev_scope_init+0x424/0x48f
? __down_write_common+0x1ec/0x230
? dmar_dev_scope_init+0x48f/0x48f
? dmar_free_unused_resources+0x109/0x109
? cpumask_next+0x16/0x20
? __kmem_cache_create+0x392/0x430
? kmem_cache_create+0x135/0x2f0
? e820__memblock_setup+0xb0/0xb0
? intel_iommu_init+0x170/0x1848
? _raw_spin_unlock_irqrestore+0x32/0x60
? migrate_enable+0x27a/0x5b0
? sched_setattr+0x20/0x20
? migrate_disable+0x1fc/0x380
? task_rq_lock+0x170/0x170
? try_to_run_init_process+0x40/0x40
? locks_remove_file+0x85/0x2f0
? dev_prepare_static_identity_mapping+0x78/0x78
? rt_spin_unlock+0x39/0x50
? lockref_put_or_lock+0x2a/0x40
? dput+0x128/0x2f0
? __rcu_read_unlock+0x66/0x80
? __fput+0x250/0x300
? __rcu_read_lock+0x1b/0x30
? mntput_no_expire+0x38/0x290
? e820__memblock_setup+0xb0/0xb0
? pci_iommu_init+0x25/0x63
? pci_iommu_init+0x25/0x63
? do_one_initcall+0x7e/0x1c0
? initcall_blacklisted+0x120/0x120
? kernel_init_freeable+0x27b/0x307
? rest_init+0xd0/0xd0
? kernel_init+0xf/0x120
? rest_init+0xd0/0xd0
? ret_from_fork+0x1f/0x40
   The buggy address belongs to the variable:
dmar_pci_notify_info_buf+0x40/0x60

Fixes: 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device 
path")
Signed-off-by: Julia Cartwright 
---
 drivers/iommu/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 6b7df25e1488..9c49300e9fb7 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -145,7 +145,7 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned 
long event)
for (tmp = dev; tmp; tmp = tmp->bus->self)
level++;
 
-   size = sizeof(*info) + level * sizeof(struct acpi_dmar_pci_path);
+   size = sizeof(*info) + level * sizeof(info->path[0]);
if (size <= sizeof(dmar_pci_notify_info_buf)) {
info = (struct dmar_pci_notify_info *)dmar_pci_notify_info_buf;
} else {
-- 
2.20.1



Re: [RFC PATCH] arm64/fpsimd: Don't disable softirq when touching FPSIMD/SVE state

2019-02-12 Thread Julia Cartwright
Hello Julien-

On Fri, Feb 08, 2019 at 04:55:13PM +, Julien Grall wrote:
> When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of
> the kernel may be able to use FPSIMD/SVE. This is for instance the case
> for crypto code.
> 
> Any use of FPSIMD/SVE in the kernel are clearly marked by using the
> function kernel_neon_{begin, end}. Furthermore, this can only be used
> when may_use_simd() returns true.
> 
> The current implementation of may_use_simd() allows softirq to use
> FPSIMD/SVE unless it is currently in used (i.e kernel_neon_busy is true).
> When in used, softirqs usually fallback to a software method.
> 
> At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled
> when touching the FPSIMD/SVE context. This has the drawback to disable
> all softirqs even if they are not using FPSIMD/SVE.
> 
> As a softirq should not rely on been able to use simd at a given time,
> there are limited reason to keep softirq disabled when touching the
> FPSIMD/SVE context. Instead, we can only disable preemption and tell
> the NEON unit is currently in use.
> 
> This patch introduces two new helpers kernel_neon_{disable, enable} to
> mark the area using FPSIMD/SVE context and use them in replacement of
> local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are
> also re-implemented to use the new helpers.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> 
> I have been exploring this solution as an alternative approach to the RT
> patch "arm64: fpsimd: use preemp_disable in addition to local_bh_disable()".
> 
> So far, the patch has only been lightly tested.
> 
> For RT-linux, it might be possible to use migrate_{enable, disable}. I
> am quite new with RT and have some trouble to understand the semantics
> of migrate_{enable, disable}. So far, I am still unsure if it is possible
> to run another userspace task on the same CPU while getting preempted
> when the migration is disabled.

Yes, a thread in a migrate_disable()-protected critical section can be
preempted, and another thread on the same CPU may enter the critical
section.

If it's necessary to prevent this from happening, then you need to also
make use of a per-CPU mutex.  On RT, this would do the right thing
w.r.t. priority inheritance.

   Julia


Re: Re: [PATCH v3 1/3] KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlock

2019-02-01 Thread Julia Cartwright
On Fri, Feb 01, 2019 at 03:30:58PM +, Julien Grall wrote:
> Hi Julien,
> 
> On 07/01/2019 15:06, Julien Thierry wrote:
> > vgic_irq->irq_lock must always be taken with interrupts disabled as
> > it is used in interrupt context.
> 
> I am a bit confused with the reason here. The code mention that ap_list_lock
> could be taken from the timer interrupt handler interrupt. I assume it
> speaks about the handler kvm_arch_timer_handler. Looking at the
> configuration of the interrupt, the flag IRQF_NO_THREAD is not set, so the
> interrupt should be threaded when CONFIG_PREEMPT_FULL is set. If my
> understanding is correct, this means the interrupt thread would sleep if it
> takes the spinlock.
> 
> Did I miss anything? Do you have an exact path where the vGIC is actually
> called from an interrupt context?

The part you're missing is that percpu interrupts are not force
threaded:

static int irq_setup_forced_threading(struct irqaction *new)
{
if (!force_irqthreads)
return 0;
if (new->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT))
return 0;

/* ...*/
}

   Julia


[ANNOUNCE] 4.9.146-rt125

2018-12-20 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.146-rt125 stable release.

Apologies for an update to the 4.9-rt tree being way overdue.

This release is just an update to the new stable 4.9.146 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: 5b5e350d8c82f5dcbad136aeede6b73d41cd1e8a

Or to build 4.9.146-rt125 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.146.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.146-rt125.patch.xz

Enjoy, and happy holidays,
Julia



[PATCH 0/2] Fix watchdogd wakeup deferral on RT

2018-09-28 Thread Julia Cartwright
The following two patches solve an issue reported by Steffen Trumtrar
and Tim Sander to the linux-rt-users mailing list.  Namely, the wakeup
of the watchdogd thread being starved by an RT task due to the hrtimer
deferral through ktimersoftd (on PREEMPT_RT_FULL).

The first patch adjusts the kthread_worker locking to make use of a raw
spinlock, making it suitable for work item queueing from hardirq context
on PREEMPT_RT.  This patch can be applied directly to mainline without
any functional change.

The second patch adjusts the hrtimer used by the watchdog core to be a
_HARD timer (and thus not deferred through ktimersoftd w/ PREEMPT_RT).
This patch depends on hrtimer patches carried in the RT patch, and so
should therefore land there.

Cc: Guenter Roeck 
Cc: Steffen Trumtrar 
Cc: Tim Sander 

Julia Cartwright (2):
  kthread: convert worker lock to raw spinlock
  watchdog, rt: prevent deferral of watchdogd wakeup

 drivers/watchdog/watchdog_dev.c |  2 +-
 include/linux/kthread.h |  2 +-
 kernel/kthread.c| 42 -
 3 files changed, 23 insertions(+), 23 deletions(-)

-- 
2.18.0



[PATCH 1/2] kthread: convert worker lock to raw spinlock

2018-09-28 Thread Julia Cartwright
In order to enable the queuing of kthread work items from hardirq
context even when PREEMPT_RT_FULL is enabled, convert the worker
spin_lock to a raw_spin_lock.

This is only acceptable to do because the work performed under the lock
is well-bounded and minimal.

Cc: Sebastian Andrzej Siewior 
Cc: Guenter Roeck 
Reported-and-tested-by: Steffen Trumtrar 
Reported-by: Tim Sander 
Signed-off-by: Julia Cartwright 
---
 include/linux/kthread.h |  2 +-
 kernel/kthread.c| 42 -
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index c1961761311d..ad292898f7f2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,7 +85,7 @@ enum {
 
 struct kthread_worker {
unsigned intflags;
-   spinlock_t  lock;
+   raw_spinlock_t  lock;
struct list_headwork_list;
struct list_headdelayed_work_list;
struct task_struct  *task;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 486dedbd9af5..c1d9ee6671c6 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -597,7 +597,7 @@ void __kthread_init_worker(struct kthread_worker *worker,
struct lock_class_key *key)
 {
memset(worker, 0, sizeof(struct kthread_worker));
-   spin_lock_init(&worker->lock);
+   raw_spin_lock_init(&worker->lock);
lockdep_set_class_and_name(&worker->lock, key, name);
INIT_LIST_HEAD(&worker->work_list);
INIT_LIST_HEAD(&worker->delayed_work_list);
@@ -639,21 +639,21 @@ int kthread_worker_fn(void *worker_ptr)
 
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
-   spin_lock_irq(&worker->lock);
+   raw_spin_lock_irq(&worker->lock);
worker->task = NULL;
-   spin_unlock_irq(&worker->lock);
+   raw_spin_unlock_irq(&worker->lock);
return 0;
}
 
work = NULL;
-   spin_lock_irq(&worker->lock);
+   raw_spin_lock_irq(&worker->lock);
if (!list_empty(&worker->work_list)) {
work = list_first_entry(&worker->work_list,
struct kthread_work, node);
list_del_init(&work->node);
}
worker->current_work = work;
-   spin_unlock_irq(&worker->lock);
+   raw_spin_unlock_irq(&worker->lock);
 
if (work) {
__set_current_state(TASK_RUNNING);
@@ -810,12 +810,12 @@ bool kthread_queue_work(struct kthread_worker *worker,
bool ret = false;
unsigned long flags;
 
-   spin_lock_irqsave(&worker->lock, flags);
+   raw_spin_lock_irqsave(&worker->lock, flags);
if (!queuing_blocked(worker, work)) {
kthread_insert_work(worker, work, &worker->work_list);
ret = true;
}
-   spin_unlock_irqrestore(&worker->lock, flags);
+   raw_spin_unlock_irqrestore(&worker->lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_work);
@@ -841,7 +841,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
if (WARN_ON_ONCE(!worker))
return;
 
-   spin_lock(&worker->lock);
+   raw_spin_lock(&worker->lock);
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);
 
@@ -850,7 +850,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
list_del_init(&work->node);
kthread_insert_work(worker, work, &worker->work_list);
 
-   spin_unlock(&worker->lock);
+   raw_spin_unlock(&worker->lock);
 }
 EXPORT_SYMBOL(kthread_delayed_work_timer_fn);
 
@@ -906,14 +906,14 @@ bool kthread_queue_delayed_work(struct kthread_worker 
*worker,
unsigned long flags;
bool ret = false;
 
-   spin_lock_irqsave(&worker->lock, flags);
+   raw_spin_lock_irqsave(&worker->lock, flags);
 
if (!queuing_blocked(worker, work)) {
__kthread_queue_delayed_work(worker, dwork, delay);
ret = true;
}
 
-   spin_unlock_irqrestore(&worker->lock, flags);
+   raw_spin_unlock_irqrestore(&worker->lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_delayed_work);
@@ -949,7 +949,7 @@ void kthread_flush_work(struct kthread_work *work)
if (!worker)
return;
 
-   spin_lock_irq(&worker->lock);
+   raw_spin_lock_irq(&worker->lock);
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);
 
@@ -961,7 +961,7 @@ void kthread_flush_work(struct k

[PATCH RT 2/2] watchdog, rt: prevent deferral of watchdogd wakeup

2018-09-28 Thread Julia Cartwright
When PREEMPT_RT_FULL is enabled, all hrtimer expiry functions are
deferred for execution into the context of ktimersoftd unless otherwise
annotated.

Deferring the expiry of the hrtimer used by the watchdog core, however,
is a waste, as the callback does nothing but queue a kthread work item
and wakeup watchdogd.

It's worst then that, too: the deferral through ktimersoftd also means
that for correct behavior a user must adjust the scheduling parameters
of both watchdogd _and_ ktimersoftd, which is unnecessary and has other
side effects (like causing unrelated expiry functions to execute at
potentially elevated priority).

Instead, mark the hrtimer used by the watchdog core as being _HARD to
allow it's execution directly from hardirq context.  The work done in
this expiry function is well-bounded and minimal.

A user still must adjust the scheduling parameters of the watchdogd
to be correct w.r.t. their application needs.

Cc: Guenter Roeck 
Reported-and-tested-by: Steffen Trumtrar 
Reported-by: Tim Sander 
Signed-off-by: Julia Cartwright 
---
 drivers/watchdog/watchdog_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index ffbdc4642ea5..9c2b3e5cebdc 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -945,7 +945,7 @@ static int watchdog_cdev_register(struct watchdog_device 
*wdd, dev_t devno)
return -ENODEV;
 
kthread_init_work(&wd_data->work, watchdog_ping_work);
-   hrtimer_init(&wd_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer_init(&wd_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
wd_data->timer.function = watchdog_timer_expired;
 
if (wdd->id == 0) {
-- 
2.18.0



Re: [ANNOUNCE] Submit a topic for the RT Microconference at Linux Plumbers

2018-09-07 Thread Julia Cartwright
Hello all-

On Tue, Sep 04, 2018 at 01:25:29PM -0400, Steven Rostedt wrote:
> Hi RT folks!
>
> The call for proposals (CfP) is now open for the RT Microconference at
> Linux Plumbers in Vancouver, Canada. The topics we are looking at this
> year are:
>
>  - How will PREEMPT_RT be maintained (is it a subsystem?)
>
>  - Who will maintain it?
>
>  - How do we teach the rest of the kernel developers how not to break 
> PREEMPT_RT?
>
>  - How do we catch when it breaks?
>
>  - What tools can we add into tools/ that other kernel developers can use to 
> test and learn about PREEMPT_RT?
>
>  - What tests can we add into tools/testing/selftests?
>
>  - What kernel boot selftests can be added?
>
>  - How can we modify lockdep to catch problems that can break PREEMPT_RT?
>
>  - Discuss various types of failures that can happen with PREEMPT_RT that 
> normally would not happen in the vanilla kernel.
>
>  - How do we handle stable backports?
>
>  - Interaction with Safety Critical domains?
>
>  - Interrupt threads are RT and are not protected by the RT Throttling.
>How can we prevent interrupt thread starvation from a rogue RT task?
>
>  - How to determine if a kernel is booted with PREEMPT_RT enabled?
>
>  - librtpi: "solving" long-standing glibc pthread condvar internal lock 
> inversion with a new implementation; motivations, technical details, etc.
>
>  - SCHED_DEADLINE - Lots of research has been made to various issues like CPU 
> affinity, Bandwidth inheritance, cgroup support; but nothing has been done to 
> the kernel. Let's make a commitment and push these ideas forward into 
> mainline.
>
>  - How can we modify lockdep to catch problems that can break PREEMPT_RT?
>
>  - Discuss various types of failures that can happen with PREEMPT_RT that 
> normally would not happen in the vanilla kernel.

There was some confusion as to whether or not a topic being listed above
automatically meant that it was to be included in the RT Microconference
agenda.

The answer is: NO.  If you would like to see a specific topic discussed,
either from this list or otherwise, you must submit it via the portal
listed below.  Only topics submitted to the portal will be considered
for the RT Microconference agenda.

Note also that submitting a topic will also mean that you will be
responsible for introducing the necessary context to the group to
facilitate further discussion.

> If you want to lead one of the above sessions, or if you have a new
> session you want to submit, please do so here:
>
>   https://linuxplumbersconf.org/event/2/abstracts/
>
> Set up an account, and then click "Submit new proposal".
>
> Fill out the "Title", "Content", select the "Authors" as the one leading
> the session (yourself), and then for "Track" select "RT MC Topics CfP".
> Select "Yes" to agree with the anti-harassment policy, and then hit
> Submit.

Thanks, and see you there!

   Julia


[PATCH RT 11/22] mm/slub: close possible memory-leak in kmem_cache_alloc_bulk()

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Under certain circumstances we could leak elements which were moved to
the local "to_free" list. The damage is limited since I can't find
any users here.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 5022166d3b225bf5e343efb3ea01b3c5a41d69ba)
Signed-off-by: Julia Cartwright 
---
 mm/slub.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/slub.c b/mm/slub.c
index 67eb368b9314..738b2bccbd5f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3209,6 +3209,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
flags, size_t size,
return i;
 error:
local_irq_enable();
+   free_delayed(&to_free);
slab_post_alloc_hook(s, flags, i, p);
__kmem_cache_free_bulk(s, i, p);
return 0;
-- 
2.18.0



[ANNOUNCE] 4.9.115-rt93

2018-08-06 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.115-rt93 stable release.

This release is just an update to the new stable 4.9.115 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: 236d20fd941a444577579ba398ba9907f3b33277

Or to build 4.9.115-rt93 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt93.patch.xz

Enjoy!
Julia



[PATCH RT 10/22] arm*: disable NEON in kernel mode

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
NEON in kernel mode is used by the crypto algorithms and raid6 code.
While the raid6 code looks okay, the crypto algorithms do not: NEON
is enabled on first invocation and may allocate/free/map memory before
the NEON mode is disabled again.
This needs to be changed until it can be enabled.
On ARM NEON in kernel mode can be simply disabled. on ARM64 it needs to
stay on due to possible EFI callbacks so here I disable each algorithm.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit b3a776555e0d465df138d254d6dc3ac1b718ac6d)
Signed-off-by: Julia Cartwright 
---
 arch/arm/Kconfig  |  2 +-
 arch/arm64/crypto/Kconfig | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5715844e83e3..8c40f7d73251 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2158,7 +2158,7 @@ config NEON
 
 config KERNEL_MODE_NEON
bool "Support for NEON in kernel mode"
-   depends on NEON && AEABI
+   depends on NEON && AEABI && !PREEMPT_RT_BASE
help
  Say Y to include support for NEON in kernel mode.
 
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9887e1..cd71b3432720 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -10,41 +10,41 @@ if ARM64_CRYPTO
 
 config CRYPTO_SHA1_ARM64_CE
tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_SHA2_ARM64_CE
tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_GHASH_ARM64_CE
tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_AES_ARM64_CE
tristate "AES core cipher using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_ALGAPI
 
 config CRYPTO_AES_ARM64_CE_CCM
tristate "AES in CCM mode using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_ALGAPI
select CRYPTO_AES_ARM64_CE
select CRYPTO_AEAD
 
 config CRYPTO_AES_ARM64_CE_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_BLKCIPHER
select CRYPTO_AES_ARM64_CE
select CRYPTO_ABLK_HELPER
 
 config CRYPTO_AES_ARM64_NEON_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_BLKCIPHER
select CRYPTO_AES
select CRYPTO_ABLK_HELPER
-- 
2.18.0



[PATCH RT 17/22] alarmtimer: Prevent live lock in alarm_cancel()

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
If alarm_try_to_cancel() requires a retry, then depending on the
priority setting the retry loop might prevent timer callback completion
on RT. Prevent that by waiting for completion on RT, no change for a
non RT kernel.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 51e376c469bf05f32cb1ceb9e39d31bb92f1f6c8)
Signed-off-by: Julia Cartwright 
---
 kernel/time/alarmtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index d67ef56ca9bc..61b20a656863 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -407,7 +407,7 @@ int alarm_cancel(struct alarm *alarm)
int ret = alarm_try_to_cancel(alarm);
if (ret >= 0)
return ret;
-   cpu_relax();
+   hrtimer_wait_for_timer(&alarm->timer);
}
 }
 EXPORT_SYMBOL_GPL(alarm_cancel);
-- 
2.18.0



[PATCH RT 18/22] posix-timers: move the rcu head out of the union

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
On RT the timer can be preempted while running and therefore we wait
with timer_wait_for_callback() for the timer to complete (instead of
busy looping). The RCU-readlock is held to ensure that this posix timer
is not removed while we wait on it.
If the timer is removed then it invokes call_rcu() with a pointer that
is shared with the hrtimer because it is part of the same union.
In order to avoid any possible side effects I am moving the rcu pointer
out of the union.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit b8401365af110949f12c7cf1fa86b4c0ea069bbd)
Signed-off-by: Julia Cartwright 
---
 include/linux/posix-timers.h | 2 +-
 kernel/time/posix-timers.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 62d44c176071..cbd3f9334543 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -92,8 +92,8 @@ struct k_itimer {
struct alarm alarmtimer;
ktime_t interval;
} alarm;
-   struct rcu_head rcu;
} it;
+   struct rcu_head rcu;
 };
 
 struct k_clock {
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 6084618436fd..45d8033caec4 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -568,7 +568,7 @@ static struct k_itimer * alloc_posix_timer(void)
 
 static void k_itimer_rcu_free(struct rcu_head *head)
 {
-   struct k_itimer *tmr = container_of(head, struct k_itimer, it.rcu);
+   struct k_itimer *tmr = container_of(head, struct k_itimer, rcu);
 
kmem_cache_free(posix_timers_cache, tmr);
 }
@@ -585,7 +585,7 @@ static void release_posix_timer(struct k_itimer *tmr, int 
it_id_set)
}
put_pid(tmr->it_pid);
sigqueue_free(tmr->sigq);
-   call_rcu(&tmr->it.rcu, k_itimer_rcu_free);
+   call_rcu(&tmr->rcu, k_itimer_rcu_free);
 }
 
 static struct k_clock *clockid_to_kclock(const clockid_t id)
-- 
2.18.0



[PATCH RT 19/22] locallock: provide {get,put}_locked_ptr() variants

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Provide a set of locallocked accessors for pointers to per-CPU data;
this is useful for dynamically-allocated per-CPU regions, for example.

These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor
variants.

Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 3d45cf23db4f76cd356ebb0aa4cdaa7d92d1a64e)
Signed-off-by: Julia Cartwright 
---
 include/linux/locallock.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index 280f884a05a3..0c3ff5b23f6a 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -238,6 +238,14 @@ static inline int __local_unlock_irqrestore(struct 
local_irq_lock *lv,
 
 #define put_locked_var(lvar, var)  local_unlock(lvar);
 
+#define get_locked_ptr(lvar, var)  \
+   ({  \
+   local_lock(lvar);   \
+   this_cpu_ptr(var);  \
+   })
+
+#define put_locked_ptr(lvar, var)  local_unlock(lvar);
+
 #define local_lock_cpu(lvar)   \
({  \
local_lock(lvar);   \
@@ -278,6 +286,8 @@ static inline void local_irq_lock_init(int lvar) { }
 
 #define get_locked_var(lvar, var)  get_cpu_var(var)
 #define put_locked_var(lvar, var)  put_cpu_var(var)
+#define get_locked_ptr(lvar, var)  get_cpu_ptr(var)
+#define put_locked_ptr(lvar, var)  put_cpu_ptr(var)
 
 #define local_lock_cpu(lvar)   get_cpu()
 #define local_unlock_cpu(lvar) put_cpu()
-- 
2.18.0



[PATCH RT 06/22] rcu: Do not include rtmutex_common.h unconditionally

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ upstream commit b88697810d7c1d102a529990f9071b0f14cfe6df ]

This commit adjusts include files and provides definitions in preparation
for suppressing lockdep false-positive ->boost_mtx complaints.  Without
this preparation, architectures not supporting rt_mutex will get build
failures.

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Julia Cartwright 
---
 kernel/rcu/tree_plugin.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a1a7bafcce15..3d18d08e8382 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -37,6 +37,7 @@
  * This probably needs to be excluded from -rt builds.
  */
 #define rt_mutex_owner(a) ({ WARN_ON_ONCE(1); NULL; })
+#define rt_mutex_futex_unlock(x) WARN_ON_ONCE(1)
 
 #endif /* #else #ifdef CONFIG_RCU_BOOST */
 
@@ -834,8 +835,6 @@ static void rcu_cpu_kthread_setup(unsigned int cpu)
 
 #ifdef CONFIG_RCU_BOOST
 
-#include "../locking/rtmutex_common.h"
-
 #ifdef CONFIG_RCU_TRACE
 
 static void rcu_initiate_boost_trace(struct rcu_node *rnp)
-- 
2.18.0



[PATCH RT 12/22] locking: add types.h

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
During the stable update the arm architecture did not compile anymore
due to missing definition of u16/32.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 1289b06974d64f244a26455fab699c6a1332f4bc)
Signed-off-by: Julia Cartwright 
---
 include/linux/spinlock_types_raw.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/spinlock_types_raw.h 
b/include/linux/spinlock_types_raw.h
index edffc4d53fc9..03235b475b77 100644
--- a/include/linux/spinlock_types_raw.h
+++ b/include/linux/spinlock_types_raw.h
@@ -1,6 +1,8 @@
 #ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 #define __LINUX_SPINLOCK_TYPES_RAW_H
 
+#include 
+
 #if defined(CONFIG_SMP)
 # include 
 #else
-- 
2.18.0



[PATCH RT 09/22] crypto: limit more FPU-enabled sections

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
  should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
  kmap_atomic().

The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.

Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.

This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 0dcc4c1693ef37e166da420ef7c68c7047c996f1)
Signed-off-by: Julia Cartwright 
---
 arch/x86/crypto/camellia_aesni_avx2_glue.c | 20 
 arch/x86/crypto/camellia_aesni_avx_glue.c  | 19 +++
 arch/x86/crypto/cast6_avx_glue.c   | 24 +++
 arch/x86/crypto/chacha20_glue.c|  9 
 arch/x86/crypto/serpent_avx2_glue.c| 19 +++
 arch/x86/crypto/serpent_avx_glue.c | 23 ++
 arch/x86/crypto/serpent_sse2_glue.c| 23 ++
 arch/x86/crypto/twofish_avx_glue.c | 27 --
 arch/x86/include/asm/fpu/api.h |  1 +
 arch/x86/kernel/fpu/core.c | 12 ++
 10 files changed, 158 insertions(+), 19 deletions(-)

diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c 
b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 60907c139c4e..0902db7d326a 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -206,6 +206,20 @@ struct crypt_priv {
bool fpu_enabled;
 };
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static void camellia_fpu_end_rt(struct crypt_priv *ctx)
+{
+   bool fpu_enabled = ctx->fpu_enabled;
+
+   if (!fpu_enabled)
+   return;
+   camellia_fpu_end(fpu_enabled);
+   ctx->fpu_enabled = false;
+}
+#else
+static void camellia_fpu_end_rt(struct crypt_priv *ctx) { }
+#endif
+
 static void encrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes)
 {
const unsigned int bsize = CAMELLIA_BLOCK_SIZE;
@@ -221,16 +235,19 @@ static void encrypt_callback(void *priv, u8 *srcdst, 
unsigned int nbytes)
}
 
if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_ecb_enc_16way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
}
 
while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_enc_blk_2way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS;
}
+   camellia_fpu_end_rt(ctx);
 
for (i = 0; i < nbytes / bsize; i++, srcdst += bsize)
camellia_enc_blk(ctx->ctx, srcdst, srcdst);
@@ -251,16 +268,19 @@ static void decrypt_callback(void *priv, u8 *srcdst, 
unsigned int nbytes)
}
 
if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_ecb_dec_16way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
}
 
while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_dec_blk_2way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS;
}
+   camellia_fpu_end_rt(ctx);
 
for (i = 0; i < nbytes / bsize; i++, srcdst += bsize)
camellia_dec_blk(ctx->ctx, srcdst, srcdst);
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c 
b/arch/x86/crypto/camellia_aesni_avx_glue.c
index d96429da88eb..3b8e91841039 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -210,6 +210,21 @@ struct crypt_priv {
bool fpu_enabled;
 };
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static void camellia_fpu_end_rt(struct crypt_priv *ctx)
+{
+   bool fpu_enabled = ctx->fpu_enabled;
+
+   if (!fpu_enabled)
+   re

[PATCH RT 07/22] rcu: Suppress lockdep false-positive ->boost_mtx complaints

2018-08-06 Thread Julia Cartwright
From: "Paul E. McKenney" 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 02a7c234e54052101164368ff981bd72f7acdd65 ]

RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an
rt_mutex structure in locked state held by some other task.  When that
other task releases it, lockdep complains (quite accurately, but a bit
uselessly) that the other task never acquired it.  This complaint can
suppress other, more helpful, lockdep complaints, and in any case it is
a false positive.

This commit therefore switches from rt_mutex_unlock() to
rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations.
Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(),
addtional adjustments will be required.

Suggested-by: Peter Zijlstra 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3d18d08e8382..510de72ad8a3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -486,7 +486,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 
/* Unboost if we were boosted. */
if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex)
-   rt_mutex_unlock(&rnp->boost_mtx);
+   rt_mutex_futex_unlock(&rnp->boost_mtx);
 
/*
 * If this was the last task on the expedited lists,
-- 
2.18.0



[PATCH RT 04/22] futex: Fix OWNER_DEAD fixup

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit a97cb0e7b3f4c6297fd857055ae8e895f402f501 ]

Both Geert and DaveJ reported that the recent futex commit:

  c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")

introduced a problem with setting OWNER_DEAD. We set the bit on an
uninitialized variable and then entirely optimize it away as a
dead-store.

Move the setting of the bit to where it is more useful.

Reported-by: Geert Uytterhoeven 
Reported-by: Dave Jones 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Link: 
http://lkml.kernel.org/r/20180122103947.gd2...@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index cdd68ba6e3a6..57038131ad3f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2301,9 +2301,6 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct 
futex_q *q,
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 
oldowner = pi_state->owner;
-   /* Owner died? */
-   if (!pi_state->owner)
-   newtid |= FUTEX_OWNER_DIED;
 
/*
 * We are here because either:
@@ -2364,6 +2361,9 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct 
futex_q *q,
}
 
newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+   /* Owner died? */
+   if (!pi_state->owner)
+   newtid |= FUTEX_OWNER_DIED;
 
if (get_futex_value_locked(&uval, uaddr))
goto handle_fault;
-- 
2.18.0



[PATCH RT 16/22] block: blk-mq: move blk_queue_usage_counter_release() into process context

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
| BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:914
| in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6
| 5 locks held by kworker/u257:6/255:
|  #0:  ("events_unbound"){.+.+.+}, at: [] 
process_one_work+0x171/0x5e0
|  #1:  ((&entry->work)){+.+.+.}, at: [] 
process_one_work+0x171/0x5e0
|  #2:  (&shost->scan_mutex){+.+.+.}, at: [] 
__scsi_add_device+0xa3/0x130 [scsi_mod]
|  #3:  (&set->tag_list_lock){+.+...}, at: [] 
blk_mq_init_queue+0x96a/0xa50
|  #4:  (rcu_read_lock_sched){..}, at: [] 
percpu_ref_kill_and_confirm+0x1d/0x120
| Preemption disabled at:[] 
blk_mq_freeze_queue_start+0x56/0x70
|
| CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1
| Workqueue: events_unbound async_run_entry_fn
|  0003 8800bc29f998 815b3a12 
|   8800bc29f9b8 8109aa16 8800bc29fa28
|  8800bc5d1bc8 8800bc29f9e8 815b8dd4 8800
| Call Trace:
|  [] dump_stack+0x4f/0x7c
|  [] __might_sleep+0x116/0x190
|  [] rt_spin_lock+0x24/0x60
|  [] __wake_up+0x29/0x60
|  [] blk_mq_usage_counter_release+0x1e/0x20
|  [] percpu_ref_kill_and_confirm+0x106/0x120
|  [] blk_mq_freeze_queue_start+0x56/0x70
|  [] blk_mq_update_tag_set_depth+0x40/0xd0
|  [] blk_mq_init_queue+0x98c/0xa50
|  [] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod]
|  [] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod]
|  [] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod]
|  [] __scsi_add_device+0x126/0x130 [scsi_mod]
|  [] ata_scsi_scan_host+0xaf/0x200 [libata]
|  [] async_port_probe+0x46/0x60 [libata]
|  [] async_run_entry_fn+0x3b/0xf0
|  [] process_one_work+0x201/0x5e0

percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in
a rcu-sched region. swait based wake queue can't be used due to
wake_up_all() usage and disabled interrupts in !RT configs (as reported
by Corey Minyard).
The wq_has_sleeper() check has been suggested by Peter Zijlstra.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 2d701058d614554cce412a787f00568b9fdffade)
Signed-off-by: Julia Cartwright 
---
 block/blk-core.c   | 14 +-
 include/linux/blkdev.h |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 87d3e0a503e5..346d5bba3948 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -675,12 +675,21 @@ void blk_queue_exit(struct request_queue *q)
percpu_ref_put(&q->q_usage_counter);
 }
 
+static void blk_queue_usage_counter_release_swork(struct swork_event *sev)
+{
+   struct request_queue *q =
+   container_of(sev, struct request_queue, mq_pcpu_wake);
+
+   wake_up_all(&q->mq_freeze_wq);
+}
+
 static void blk_queue_usage_counter_release(struct percpu_ref *ref)
 {
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   wake_up_all(&q->mq_freeze_wq);
+   if (wq_has_sleeper(&q->mq_freeze_wq))
+   swork_queue(&q->mq_pcpu_wake);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -751,6 +760,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
__set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
 
init_waitqueue_head(&q->mq_freeze_wq);
+   INIT_SWORK(&q->mq_pcpu_wake, blk_queue_usage_counter_release_swork);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
@@ -3556,6 +3566,8 @@ int __init blk_dev_init(void)
if (!kblockd_workqueue)
panic("Failed to create kblockd\n");
 
+   BUG_ON(swork_get());
+
request_cachep = kmem_cache_create("blkdev_requests",
sizeof(struct request), 0, SLAB_PANIC, NULL);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fdb449fe3ff7..ab039211ab9f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct module;
 struct scsi_ioctl_command;
@@ -469,6 +470,7 @@ struct request_queue {
 #endif
struct rcu_head rcu_head;
wait_queue_head_t   mq_freeze_wq;
+   struct swork_event  mq_pcpu_wake;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.18.0



[PATCH RT 02/22] futex: Fix more put_pi_state() vs. exit_pi_state_list() races

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 51d00899f7e6ded15c89cb4e2cb11a35283bac81 ]

Dmitry (through syzbot) reported being able to trigger the WARN in
get_pi_state() and a use-after-free on:

raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);

Both are due to this race:

  exit_pi_state_list()  put_pi_state()

  lock(&curr->pi_lock)
  while() {
pi_state = list_first_entry(head);
hb = hash_futex(&pi_state->key);
unlock(&curr->pi_lock);


dec_and_test(&pi_state->refcount);

lock(&hb->lock)
lock(&pi_state->pi_mutex.wait_lock) // uaf if pi_state free'd
lock(&curr->pi_lock);



unlock(&curr->pi_lock);
get_pi_state(); // WARN; refcount==0

The problem is we take the reference count too late, and don't allow it
being 0. Fix it by using inc_not_zero() and simply retrying the loop
when we fail to get a refcount. In that case put_pi_state() should
remove the entry from the list.

Reported-by: Dmitry Vyukov 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Thomas Gleixner 
Cc: Gratian Crisan 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: dvh...@infradead.org
Cc: syzbot 

Cc: syzkaller-b...@googlegroups.com
Cc: 
Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization")
Link: 
http://lkml.kernel.org/r/20171031101853.xpfh72y643kdf...@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 47e42faad6c5..270148be5647 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -899,11 +899,27 @@ void exit_pi_state_list(struct task_struct *curr)
 */
raw_spin_lock_irq(&curr->pi_lock);
while (!list_empty(head)) {
-
next = head->next;
pi_state = list_entry(next, struct futex_pi_state, list);
key = pi_state->key;
hb = hash_futex(&key);
+
+   /*
+* We can race against put_pi_state() removing itself from the
+* list (a waiter going away). put_pi_state() will first
+* decrement the reference count and then modify the list, so
+* its possible to see the list entry but fail this reference
+* acquire.
+*
+* In that case; drop the locks to let put_pi_state() make
+* progress and retry the loop.
+*/
+   if (!atomic_inc_not_zero(&pi_state->refcount)) {
+   raw_spin_unlock_irq(&curr->pi_lock);
+   cpu_relax();
+   raw_spin_lock_irq(&curr->pi_lock);
+   continue;
+   }
raw_spin_unlock_irq(&curr->pi_lock);
 
spin_lock(&hb->lock);
@@ -914,10 +930,12 @@ void exit_pi_state_list(struct task_struct *curr)
 * task still owns the PI-state:
 */
if (head->next != next) {
+   /* retain curr->pi_lock for the loop invariant */
raw_spin_unlock(&curr->pi_lock);
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
+   put_pi_state(pi_state);
continue;
}
 
@@ -925,9 +943,8 @@ void exit_pi_state_list(struct task_struct *curr)
WARN_ON(list_empty(&pi_state->list));
list_del_init(&pi_state->list);
pi_state->owner = NULL;
-   raw_spin_unlock(&curr->pi_lock);
 
-   get_pi_state(pi_state);
+   raw_spin_unlock(&curr->pi_lock);
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(&hb->lock);
 
-- 
2.18.0



[PATCH RT 21/22] seqlock: provide the same ordering semantics as mainline

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
The mainline implementation of read_seqbegin() orders prior loads w.r.t.
the read-side critical section.  Fixup the RT writer-boosting
implementation to provide the same guarantee.

Also, while we're here, update the usage of ACCESS_ONCE() to use
READ_ONCE().

Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation")
Cc: stable...@vger.kernel.org
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit afa4c06b89a3c0fb7784ff900ccd707bef519cb7)
Signed-off-by: Julia Cartwright 
---
 include/linux/seqlock.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 3d7223ffdd3b..de04d6d0face 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -461,6 +461,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
spin_unlock_wait(&sl->lock);
goto repeat;
}
+   smp_rmb();
return ret;
 }
 #endif
-- 
2.18.0



[PATCH RT 01/22] futex: Fix pi_state->owner serialization

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit c74aef2d06a9f59cece89093eecc552933cba72a ]

There was a reported suspicion about a race between exit_pi_state_list()
and put_pi_state(). The same report mentioned the comment with
put_pi_state() said it should be called with hb->lock held, and it no
longer is in all places.

As it turns out, the pi_state->owner serialization is indeed broken. As per
the new rules:

  734009e96d19 ("futex: Change locking rules")

pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock.
For the sites setting pi_state->owner we already hold wait_lock (where
required) but exit_pi_state_list() and put_pi_state() were not and
raced on clearing it.

Fixes: 734009e96d19 ("futex: Change locking rules")
Reported-by: Gratian Crisan 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: dvh...@infradead.org
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4...@hirez.programming.kicks-ass.net
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 8ab0ddd4cf8f..47e42faad6c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -819,8 +819,6 @@ static void get_pi_state(struct futex_pi_state *pi_state)
 /*
  * Drops a reference to the pi_state object and frees or caches it
  * when the last reference is gone.
- *
- * Must be called with the hb lock held.
  */
 static void put_pi_state(struct futex_pi_state *pi_state)
 {
@@ -835,16 +833,22 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 * and has cleaned up the pi_state already
 */
if (pi_state->owner) {
-   raw_spin_lock_irq(&pi_state->owner->pi_lock);
-   list_del_init(&pi_state->list);
-   raw_spin_unlock_irq(&pi_state->owner->pi_lock);
+   struct task_struct *owner;
 
-   rt_mutex_proxy_unlock(&pi_state->pi_mutex, pi_state->owner);
+   raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+   owner = pi_state->owner;
+   if (owner) {
+   raw_spin_lock(&owner->pi_lock);
+   list_del_init(&pi_state->list);
+   raw_spin_unlock(&owner->pi_lock);
+   }
+   rt_mutex_proxy_unlock(&pi_state->pi_mutex, owner);
+   raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
}
 
-   if (current->pi_state_cache)
+   if (current->pi_state_cache) {
kfree(pi_state);
-   else {
+   } else {
/*
 * pi_state->list is already empty.
 * clear pi_state->owner.
@@ -903,14 +907,15 @@ void exit_pi_state_list(struct task_struct *curr)
raw_spin_unlock_irq(&curr->pi_lock);
 
spin_lock(&hb->lock);
-
-   raw_spin_lock_irq(&curr->pi_lock);
+   raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+   raw_spin_lock(&curr->pi_lock);
/*
 * We dropped the pi-lock, so re-check whether this
 * task still owns the PI-state:
 */
if (head->next != next) {
-   raw_spin_unlock_irq(&curr->pi_lock);
+   raw_spin_unlock(&curr->pi_lock);
+   raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
continue;
@@ -920,9 +925,10 @@ void exit_pi_state_list(struct task_struct *curr)
WARN_ON(list_empty(&pi_state->list));
list_del_init(&pi_state->list);
pi_state->owner = NULL;
-   raw_spin_unlock_irq(&curr->pi_lock);
+   raw_spin_unlock(&curr->pi_lock);
 
get_pi_state(pi_state);
+   raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(&hb->lock);
 
rt_mutex_futex_unlock(&pi_state->pi_mutex);
@@ -1204,6 +1210,10 @@ static int attach_to_pi_owner(u32 uval, union futex_key 
*key,
 
WARN_ON(!list_empty(&pi_state->list));
list_add(&pi_state->list, &p->pi_state_list);
+   /*
+* Assignment without holding pi_state->pi_mutex.wait_lock is safe
+* because there is no concurrency as the object is not published yet.
+*/
pi_state->owner = p;
raw_spin_unlock_irq(&p->pi_lock);
 
-- 
2.18.0



[PATCH RT 14/22] Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread"

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
I've been looking at this in v3.10-RT where it got in. The patch
description says

|The ntp code for notify_cmos_timer() is called from a hard interrupt
|context.

I see only one caller of ntp_notify_cmos_timer() and that is
do_adjtimex() after "raw_spin_unlock_irqrestore()".
I see a few callers of do_adjtimex() which is SYS_adjtimex() (+compat)
and posix_clock_realtime_adj() which in turn is called by
SYS_clock_adjtime().

Reverting the patch.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 932c5783d4434250a1019f49ae81b80731dfd4cd)
Signed-off-by: Julia Cartwright 
---
 kernel/time/ntp.c | 26 --
 1 file changed, 26 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 05b7391bf9bd..6df8927c58a5 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "ntp_internal.h"
 #include "timekeeping_internal.h"
@@ -569,35 +568,10 @@ static void sync_cmos_clock(struct work_struct *work)
   &sync_cmos_work, timespec64_to_jiffies(&next));
 }
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-
-static void run_clock_set_delay(struct swork_event *event)
-{
-   queue_delayed_work(system_power_efficient_wq, &sync_cmos_work, 0);
-}
-
-static struct swork_event ntp_cmos_swork;
-
-void ntp_notify_cmos_timer(void)
-{
-   swork_queue(&ntp_cmos_swork);
-}
-
-static __init int create_cmos_delay_thread(void)
-{
-   WARN_ON(swork_get());
-   INIT_SWORK(&ntp_cmos_swork, run_clock_set_delay);
-   return 0;
-}
-early_initcall(create_cmos_delay_thread);
-
-#else
-
 void ntp_notify_cmos_timer(void)
 {
queue_delayed_work(system_power_efficient_wq, &sync_cmos_work, 0);
 }
-#endif /* CONFIG_PREEMPT_RT_FULL */
 
 #else
 void ntp_notify_cmos_timer(void) { }
-- 
2.18.0



[PATCH RT 13/22] net: use task_struct instead of CPU number as the queue owner on -RT

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
In commit ("net: move xmit_recursion to per-task variable on -RT") the
recursion level was changed to be per-task since we can get preempted in
BH on -RT. The lock owner should consequently be recorded as the task
that holds the lock and not the CPU. Otherwise we trigger the "Dead loop
on virtual device" warning on SMP systems.

Cc: stable...@vger.kernel.org
Reported-by: Kurt Kanzenbach 
Tested-by: Kurt Kanzenbach 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit d3a66ffd1c4f0253076069b10a8223e7b6e80e38)
Signed-off-by: Julia Cartwright 
---
 include/linux/netdevice.h | 54 ++-
 net/core/dev.c|  6 -
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 85fc72b8a92b..6f1a3f286c4b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -594,7 +594,11 @@ struct netdev_queue {
  * write-mostly part
  */
spinlock_t  _xmit_lock cacheline_aligned_in_smp;
+#ifdef CONFIG_PREEMPT_RT_FULL
+   struct task_struct  *xmit_lock_owner;
+#else
int xmit_lock_owner;
+#endif
/*
 * Time (in jiffies) of last Tx
 */
@@ -3610,41 +3614,79 @@ static inline u32 netif_msg_init(int debug_value, int 
default_msg_enable_bits)
return (1 << debug_value) - 1;
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu)
+{
+   txq->xmit_lock_owner = current;
+}
+
+static inline void netdev_queue_clear_owner(struct netdev_queue *txq)
+{
+   txq->xmit_lock_owner = NULL;
+}
+
+static inline bool netdev_queue_has_owner(struct netdev_queue *txq)
+{
+   if (txq->xmit_lock_owner != NULL)
+   return true;
+   return false;
+}
+
+#else
+
+static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu)
+{
+   txq->xmit_lock_owner = cpu;
+}
+
+static inline void netdev_queue_clear_owner(struct netdev_queue *txq)
+{
+   txq->xmit_lock_owner = -1;
+}
+
+static inline bool netdev_queue_has_owner(struct netdev_queue *txq)
+{
+   if (txq->xmit_lock_owner != -1)
+   return true;
+   return false;
+}
+#endif
+
 static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu)
 {
spin_lock(&txq->_xmit_lock);
-   txq->xmit_lock_owner = cpu;
+   netdev_queue_set_owner(txq, cpu);
 }
 
 static inline void __netif_tx_lock_bh(struct netdev_queue *txq)
 {
spin_lock_bh(&txq->_xmit_lock);
-   txq->xmit_lock_owner = smp_processor_id();
+   netdev_queue_set_owner(txq, smp_processor_id());
 }
 
 static inline bool __netif_tx_trylock(struct netdev_queue *txq)
 {
bool ok = spin_trylock(&txq->_xmit_lock);
if (likely(ok))
-   txq->xmit_lock_owner = smp_processor_id();
+   netdev_queue_set_owner(txq, smp_processor_id());
return ok;
 }
 
 static inline void __netif_tx_unlock(struct netdev_queue *txq)
 {
-   txq->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(txq);
spin_unlock(&txq->_xmit_lock);
 }
 
 static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
 {
-   txq->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(txq);
spin_unlock_bh(&txq->_xmit_lock);
 }
 
 static inline void txq_trans_update(struct netdev_queue *txq)
 {
-   if (txq->xmit_lock_owner != -1)
+   if (netdev_queue_has_owner(txq))
txq->trans_start = jiffies;
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 93995575d23a..e7dc4700e463 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3449,7 +3449,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
*accel_priv)
if (dev->flags & IFF_UP) {
int cpu = smp_processor_id(); /* ok because BHs are off */
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+   if (txq->xmit_lock_owner != current) {
+#else
if (txq->xmit_lock_owner != cpu) {
+#endif
if (unlikely(xmit_rec_read() > XMIT_RECURSION_LIMIT))
goto recursion_alert;
 
@@ -7168,7 +7172,7 @@ static void netdev_init_one_queue(struct net_device *dev,
/* Initialize queue lock */
spin_lock_init(&queue->_xmit_lock);
netdev_set_xmit_lockdep_class(&queue->_xmit_lock, dev->type);
-   queue->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(queue);
netdev_queue_numa_node_write(queue, NUMA_NO_NODE);
queue->dev = dev;
 #ifdef CONFIG_BQL
-- 
2.18.0



[PATCH RT 05/22] rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites

2018-08-06 Thread Julia Cartwright
From: Boqun Feng 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 6b0ef92fee2a3189eba6d6b827b247cb4f6da7e9 ]

When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and
kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a
HARDIRQ-safe->HARDIRQ-unsafe deadlock:

 === WARNING: inconsistent lock state
 4.16.0-rc4+ #1 Not tainted
 
 inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
 takes:
 __schedule+0xbe/0xaf0
 {IN-HARDIRQ-W} state was registered at:
   _raw_spin_lock+0x2a/0x40
   scheduler_tick+0x47/0xf0
...
 other info that might help us debug this:
  Possible unsafe locking scenario:
CPU0

   lock(&rq->lock);
   
 lock(&rq->lock);
  *** DEADLOCK ***
 1 lock held by rcu_torture_rea/724:
 rcu_torture_read_lock+0x0/0x70
 stack backtrace:
 CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.0-20171110_100015-anatol 04/01/2014
 Call Trace:
  lock_acquire+0x90/0x200
  ? __schedule+0xbe/0xaf0
  _raw_spin_lock+0x2a/0x40
  ? __schedule+0xbe/0xaf0
  __schedule+0xbe/0xaf0
  preempt_schedule_irq+0x2f/0x60
  retint_kernel+0x1b/0x2d
 RIP: 0010:rcu_read_unlock_special+0x0/0x680
  ? rcu_torture_read_unlock+0x60/0x60
  __rcu_read_unlock+0x64/0x70
  rcu_torture_read_unlock+0x17/0x60
  rcu_torture_reader+0x275/0x450
  ? rcutorture_booster_init+0x110/0x110
  ? rcu_torture_stall+0x230/0x230
  ? kthread+0x10e/0x130
  kthread+0x10e/0x130
  ? kthread_create_worker_on_cpu+0x70/0x70
  ? call_usermodehelper_exec_async+0x11a/0x150
  ret_from_fork+0x3a/0x50

This happens with the following even sequence:

preempt_schedule_irq();
  local_irq_enable();
  __schedule():
local_irq_disable(); // irq off
...
rcu_note_context_switch():
  rcu_note_preempt_context_switch():
rcu_read_unlock_special():
  local_irq_save(flags);
  ...
  raw_spin_unlock_irqrestore(...,flags); // irq remains off
  rt_mutex_futex_unlock():
raw_spin_lock_irq();
...
raw_spin_unlock_irq(); // accidentally set irq on


rq_lock():
  raw_spin_lock(); // acquiring rq->lock with irq on

which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause
deadlocks in scheduler code.

This problem was introduced by commit 02a7c234e540 ("rcu: Suppress
lockdep false-positive ->boost_mtx complaints"). That brought the user
of rt_mutex_futex_unlock() with irq off.

To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with
*lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock()
with irq off.

Fixes: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx 
complaints")
Signed-off-by: Boqun Feng 
Signed-off-by: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Lai Jiangshan 
Cc: Steven Rostedt 
Cc: Josh Triplett 
Cc: Mathieu Desnoyers 
Cc: "Paul E . McKenney" 
Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.f...@gmail.com
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/locking/rtmutex.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 57361d631749..5e15f5c73637 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -2213,11 +2213,12 @@ void __sched rt_mutex_futex_unlock(struct rt_mutex 
*lock)
 {
WAKE_Q(wake_q);
WAKE_Q(wake_sleeper_q);
+   unsigned long flags;
bool postunlock;
 
-   raw_spin_lock_irq(&lock->wait_lock);
+   raw_spin_lock_irqsave(&lock->wait_lock, flags);
postunlock = __rt_mutex_futex_unlock(lock, &wake_q, &wake_sleeper_q);
-   raw_spin_unlock_irq(&lock->wait_lock);
+   raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 
if (postunlock)
rt_mutex_postunlock(&wake_q, &wake_sleeper_q);
-- 
2.18.0



[PATCH RT 20/22] squashfs: make use of local lock in multi_cpu decompressor

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Cc: stable...@vger.kernel.org
Reported-by: Alexander Stein 
Tested-by: Alexander Stein 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit c160736542d7b3d67da32848d2f028b8e35730e5)
Signed-off-by: Julia Cartwright 
---
 fs/squashfs/decompressor_multi_percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c 
b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..6a73c4fa88e7 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -25,6 +26,8 @@ struct squashfs_stream {
void*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, 
struct buffer_head **bh,
 {
struct squashfs_stream __percpu *percpu =
(struct squashfs_stream __percpu *) msblk->stream;
-   struct squashfs_stream *stream = get_cpu_ptr(percpu);
-   int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
-   put_cpu_ptr(stream);
+   struct squashfs_stream *stream;
+   int res;
+
+   stream = get_locked_ptr(stream_lock, percpu);
+
+   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+   offset, length, output);
+
+   put_locked_ptr(stream_lock, stream);
 
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",
-- 
2.18.0



[PATCH RT 08/22] sched, tracing: Fix trace_sched_pi_setprio() for deboosting

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 4ff648decf4712d39f184fc2df3163f43975575a ]

Since the following commit:

  b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")

the sched_pi_setprio trace point shows the "newprio" during a deboost:

  |futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio˜ newprio=98
  |futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio0

This patch open codes __rt_effective_prio() in the tracepoint as the
'newprio' to get the old behaviour back / the correct priority:

  |futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio˜ newprio=120
  |futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio0

Peter suggested to open code the new priority so people using tracehook
could get the deadline data out.

Reported-by: Mansky Christian 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Fixes: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")
Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmj...@linutronix.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Julia Cartwright 
---
 include/trace/events/sched.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 516ae88cddf4..742682079acf 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -429,7 +429,9 @@ TRACE_EVENT(sched_pi_setprio,
memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
__entry->pid= tsk->pid;
__entry->oldprio= tsk->prio;
-   __entry->newprio= pi_task ? pi_task->prio : tsk->prio;
+   __entry->newprio= pi_task ?
+   min(tsk->normal_prio, pi_task->prio) :
+   tsk->normal_prio;
/* XXX SCHED_DEADLINE bits missing */
),
 
-- 
2.18.0



[PATCH RT 03/22] futex: Avoid violating the 10th rule of futex

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 ]

Julia reported futex state corruption in the following scenario:

   waiter  waker
stealer (prio > waiter)

   futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
 timeout=[N ms])
  futex_wait_requeue_pi()
 futex_wait_queue_me()
freezable_schedule()

   futex(LOCK_PI, uaddr2)
   futex(CMP_REQUEUE_PI, uaddr,
 uaddr2, 1, 0)
  /* requeues waiter to uaddr2 */
   futex(UNLOCK_PI, uaddr2)
 wake_futex_pi()

cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
   
   task>

   futex(LOCK_PI, uaddr2)

  __rt_mutex_start_proxy_lock()

 try_to_take_rt_mutex() /* steals lock */

rt_mutex_set_owner(lock, stealer)

  
 
 rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
   try_to_take_rt_mutex() /* fails, lock held by stealer */
   if (timeout && !timeout->task)
  return -ETIMEDOUT;
fixup_owner()
   /* lock wasn't acquired, so,
  fixup_pi_state_owner skipped */

   return -ETIMEDOUT;

   /* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */

   futex_lock(LOCK_PI, uaddr2)
 -> bails with EDEADLK, futex word says we're owner.

And suggested that what commit:

  73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:

  16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

changed all the locking rules. Even without that, the sequence:

-   if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
-   locked = 1;
-   goto out;
-   }

-   raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
-   owner = rt_mutex_owner(&q->pi_state->pi_mutex);
-   if (!owner)
-   owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
-   raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
-   ret = fixup_pi_state_owner(uaddr, q, owner);

already suggests there were races; otherwise we'd never have to look
at next_owner.

So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.

Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.

Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright 
Reported-by: Gratian Crisan 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Tested-by: Julia Cartwright 
Tested-by: Gratian Crisan 
Cc: Darren Hart 
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/20171208124939.7livp7no2ov65...@hirez.programming.kicks-ass.net
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c  | 83 ++---
 kernel/locking/rtmutex.c| 26 ---
 kernel/locking/rtmutex_common.h |  1 +
 3 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 270148be5647..cdd68ba6e3a6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2287,21 +2287,17 @@ static void unqueue_me_pi(struct futex_q *q)
spin_unlock(q->

[PATCH RT 15/22] Revert "block: blk-mq: Use swait"

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
This reverts commit "block: blk-mq: Use swait". The issue remains but
will be fixed differently.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry-picked from ca3fd6cf836739fd59eac2f7a9b0261365e818bb)
Signed-off-by: Julia Cartwright 
---
 block/blk-core.c   | 10 +-
 block/blk-mq.c |  6 +++---
 include/linux/blkdev.h |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e4ac43392875..87d3e0a503e5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -662,9 +662,9 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
if (nowait)
return -EBUSY;
 
-   swait_event(q->mq_freeze_wq,
-   !atomic_read(&q->mq_freeze_depth) ||
-   blk_queue_dying(q));
+   wait_event(q->mq_freeze_wq,
+  !atomic_read(&q->mq_freeze_depth) ||
+  blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
}
@@ -680,7 +680,7 @@ static void blk_queue_usage_counter_release(struct 
percpu_ref *ref)
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   swake_up_all(&q->mq_freeze_wq);
+   wake_up_all(&q->mq_freeze_wq);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -750,7 +750,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
q->bypass_depth = 1;
__set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
 
-   init_swait_queue_head(&q->mq_freeze_wq);
+   init_waitqueue_head(&q->mq_freeze_wq);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e0a804ab5420..3a49552974ec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -72,7 +72,7 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
 static void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
-   swait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
+   wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
 }
 
 /*
@@ -110,7 +110,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
percpu_ref_reinit(&q->q_usage_counter);
-   swake_up_all(&q->mq_freeze_wq);
+   wake_up_all(&q->mq_freeze_wq);
}
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
@@ -129,7 +129,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
 * dying, we need to ensure that processes currently waiting on
 * the queue are notified as well.
 */
-   swake_up_all(&q->mq_freeze_wq);
+   wake_up_all(&q->mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 34fd1ed9845e..fdb449fe3ff7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -468,7 +468,7 @@ struct request_queue {
struct throtl_data *td;
 #endif
struct rcu_head rcu_head;
-   struct swait_queue_head mq_freeze_wq;
+   wait_queue_head_t   mq_freeze_wq;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.18.0



[PATCH RT 00/22] Linux 4.9.115-rt94-rc1

2018-08-06 Thread Julia Cartwright
Hello RT folks!

This patchset brings back many RT-specific fixes that have gone into
subsequent 4.14-rt and 4.16-rt releases.

One of my x86 boxes very intermittently triggers a WARN_ON() on bootup
in migrate_enable(), which I'm still trying to triage.  If you can more
reliably reproduce this, please let me know.

This release candidate will not be pushed to the git tree.

To build 4.9.115-rt94-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt94-rc1.patch.xz

If all goes well with testing, this rc will be promoted to an official
release on 8/16/2018.

Please go forth and test!

Thanks,
   Julia

---
Boqun Feng (1):
  rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites

Julia Cartwright (4):
  locallock: provide {get,put}_locked_ptr() variants
  squashfs: make use of local lock in multi_cpu decompressor
  seqlock: provide the same ordering semantics as mainline
  Linux 4.9.115-rt94-rc1

Paul E. McKenney (1):
  rcu: Suppress lockdep false-positive ->boost_mtx complaints

Peter Zijlstra (4):
  futex: Fix pi_state->owner serialization
  futex: Fix more put_pi_state() vs. exit_pi_state_list() races
  futex: Avoid violating the 10th rule of futex
  futex: Fix OWNER_DEAD fixup

Sebastian Andrzej Siewior (12):
  rcu: Do not include rtmutex_common.h unconditionally
  sched, tracing: Fix trace_sched_pi_setprio() for deboosting
  crypto: limit more FPU-enabled sections
  arm*: disable NEON in kernel mode
  mm/slub: close possible memory-leak in kmem_cache_alloc_bulk()
  locking: add types.h
  net: use task_struct instead of CPU number as the queue owner on -RT
  Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread"
  Revert "block: blk-mq: Use swait"
  block: blk-mq: move blk_queue_usage_counter_release() into process
context
  alarmtimer: Prevent live lock in alarm_cancel()
  posix-timers: move the rcu head out of the union

 arch/arm/Kconfig   |   2 +-
 arch/arm64/crypto/Kconfig  |  14 +-
 arch/x86/crypto/camellia_aesni_avx2_glue.c |  20 +++
 arch/x86/crypto/camellia_aesni_avx_glue.c  |  19 +++
 arch/x86/crypto/cast6_avx_glue.c   |  24 +++-
 arch/x86/crypto/chacha20_glue.c|   9 +-
 arch/x86/crypto/serpent_avx2_glue.c|  19 +++
 arch/x86/crypto/serpent_avx_glue.c |  23 +++-
 arch/x86/crypto/serpent_sse2_glue.c|  23 +++-
 arch/x86/crypto/twofish_avx_glue.c |  27 +++-
 arch/x86/include/asm/fpu/api.h |   1 +
 arch/x86/kernel/fpu/core.c |  12 ++
 block/blk-core.c   |  22 +++-
 block/blk-mq.c |   6 +-
 fs/squashfs/decompressor_multi_percpu.c|  16 ++-
 include/linux/blkdev.h |   4 +-
 include/linux/locallock.h  |  10 ++
 include/linux/netdevice.h  |  54 +++-
 include/linux/posix-timers.h   |   2 +-
 include/linux/seqlock.h|   1 +
 include/linux/spinlock_types_raw.h |   2 +
 include/trace/events/sched.h   |   4 +-
 kernel/futex.c | 144 -
 kernel/locking/rtmutex.c   |  31 +++--
 kernel/locking/rtmutex_common.h|   1 +
 kernel/rcu/tree_plugin.h   |   5 +-
 kernel/time/alarmtimer.c   |   2 +-
 kernel/time/ntp.c  |  26 
 kernel/time/posix-timers.c |   4 +-
 localversion-rt|   2 +-
 mm/slub.c  |   1 +
 net/core/dev.c |   6 +-
 32 files changed, 412 insertions(+), 124 deletions(-)

-- 
2.18.0



[PATCH RT 22/22] Linux 4.9.115-rt94-rc1

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index e98a1fe050bd..dcc2fd2ca155 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt93
+-rt94-rc1
-- 
2.18.0



Re: simple-framebuffer enquire

2018-06-26 Thread Julia Cartwright
On Tue, Jun 26, 2018 at 08:44:58PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 26-06-18 18:35, Michael Nazzareno Trimarchi wrote:
[..]
> > cat memblock/reserved
> > 0: 0x80004000..0x80007fff
> > 1: 0x8010..0x81e030b3
> > 2: 0x8300..0x83007fff
> > 3: 0x8400..0x85ff
> > 4: 0x86fa2000..0x87021fff
> > 
> > +   reserved-memory {
> > +   #address-cells = <1>;
> > +   #size-cells = <1>;
> > +   ranges;
> > +
> > +   display_reserved: framebuffer@86fa2000 {
> > +   reg = <0x86fa2000 0x8>;
> > +   };
> > +
> > +   };
[..]
> > Still have the same on ioremap.
> 
> Hmm, I guess the kernel does map the entire region its get
> passed and simply makes sure to not touch the reserved mem,
> where as with the changes to the passed in mem-region the
> sunxi u-boot code does the memory does not get mapped by
> the kernel at all ?

If the intent is to reserve memory _and_ prevent it from being included
in the kernel's linear map, then it is also necessary to include the
'no-map' property for this reserved-mem node.

>From Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt:

   no-map (optional) - empty property
   - Indicates the operating system must not create a virtual mapping
 of the region as part of its standard mapping of system memory,
 nor permit speculative access to it under any circumstances other
 than under the control of the device driver using the region.

Julia


[ANNOUNCE] 4.9.98-rt76

2018-05-10 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.98-rt76 stable release.

This release is just an update to the new stable 4.9.98 version
and no RT specific changes have been made.

Expect a 4.9.98-rt77-rc1 with backports from rt-devel soon.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: bc64e52891b8b9e56b19cbb800a1b8c415df13e8

Or to build 4.9.98-rt76 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.98.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.98-rt76.patch.xz

Enjoy!
Julia



Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller

2018-05-07 Thread Julia Cartwright
On Mon, May 07, 2018 at 10:12:28AM +, Naga Sureshkumar Relli wrote:
> Hi Julia,
>
> Thanks for reviewing the patch and Sorry for my late reply. This patch
> went to junk folder, hence I didn't catch this patch.
>
> From: Julia Cartwright [mailto:ju...@ni.com]
[..]
> > 
> > It would be easier to follow if you constructed your two patchsets with git 
> > format-patch --
> > thread.
> >
>
> I am using the same but with out --thread.
>
> > Or, merge them into a single patchset, especially considering the 
> > dependency between patches.
>
> But both are different patches, one for Device tree documentation and other 
> for driver update.

Yes, I'm not proposing you merge _patches_ but _patchsets_.  You have
two patchsets, one for the SMC driver, and another for the NAND.  Given
that they depend on one another, it's helpful for reviewers if you sent
them all together, with a cover letter which describes the entire
patchset, changelog, it's dependencies, revision changelog, etc.

Something like:

   [PATCH v9 0/4] rawnand: memory: add support for PL353 static memory 
controller + NAND
  [PATCH v9 1/4] Devicetree: Add pl353 smc controller devicetree binding 
information
  [PATCH v9 2/4] memory: pl353: Add driver for arm pl353 static memory 
controller
  [PATCH v9 3/4] Documentation: nand: pl353: Add documentation for 
controller and driver
  [PATCH v9 4/4] mtd: rawnand: pl353: Add basic driver for arm pl353 smc 
nand interface

Anyway, just with --thread enabled would be an improvement.

[..]
> > > --- a/drivers/memory/Kconfig
> > > +++ b/drivers/memory/Kconfig
> > > @@ -152,6 +152,13 @@ config DA8XX_DDRCTL
> > > This driver is for the DDR2/mDDR Memory Controller present on
> > > Texas Instruments da8xx SoCs. It's used to tweak various memory
> > > controller configuration options.
> > > +config PL35X_SMC
> > > + bool "ARM PL35X Static Memory Controller(SMC) driver"
> >
> > Is there any reason why this can't be tristate?
>
> There is a Nand driver which uses this driver. i.e The NAND driver
> Depends on this driver.

That's true, but it's irrelevant to question I asked.  It is perfectly
valid for both the SMC and NAND drivers to be tristate, why are you not
allowing this configuration?

   Julia


[PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants

2018-05-07 Thread Julia Cartwright
Provide a set of locallocked accessors for pointers to per-CPU data;
this is useful for dynamically-allocated per-CPU regions, for example.

These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor
variants.

Signed-off-by: Julia Cartwright 
---
 include/linux/locallock.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index d658c2552601..921eab83cd34 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -222,6 +222,14 @@ static inline int __local_unlock_irqrestore(struct 
local_irq_lock *lv,
 
 #define put_locked_var(lvar, var)  local_unlock(lvar);
 
+#define get_locked_ptr(lvar, var)  \
+   ({  \
+   local_lock(lvar);   \
+   this_cpu_ptr(var);  \
+   })
+
+#define put_locked_ptr(lvar, var)  local_unlock(lvar);
+
 #define local_lock_cpu(lvar)   \
({  \
local_lock(lvar);   \
@@ -262,6 +270,8 @@ static inline void local_irq_lock_init(int lvar) { }
 
 #define get_locked_var(lvar, var)  get_cpu_var(var)
 #define put_locked_var(lvar, var)  put_cpu_var(var)
+#define get_locked_ptr(lvar, var)  get_cpu_ptr(var)
+#define put_locked_ptr(lvar, var)  put_cpu_ptr(var)
 
 #define local_lock_cpu(lvar)   get_cpu()
 #define local_unlock_cpu(lvar) put_cpu()
-- 
2.17.0



[PATCH RT 2/2] squashfs: make use of local lock in multi_cpu decompressor

2018-05-07 Thread Julia Cartwright
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Reported-by: Alexander Stein 
Tested-by: Alexander Stein 
Signed-off-by: Julia Cartwright 
---
 fs/squashfs/decompressor_multi_percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c 
b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..6a73c4fa88e7 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -25,6 +26,8 @@ struct squashfs_stream {
void*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, 
struct buffer_head **bh,
 {
struct squashfs_stream __percpu *percpu =
(struct squashfs_stream __percpu *) msblk->stream;
-   struct squashfs_stream *stream = get_cpu_ptr(percpu);
-   int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
-   put_cpu_ptr(stream);
+   struct squashfs_stream *stream;
+   int res;
+
+   stream = get_locked_ptr(stream_lock, percpu);
+
+   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+   offset, length, output);
+
+   put_locked_ptr(stream_lock, stream);
 
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",
-- 
2.17.0



[PATCH RT] seqlock: provide the same ordering semantics as mainline

2018-04-26 Thread Julia Cartwright
The mainline implementation of read_seqbegin() orders prior loads w.r.t.
the read-side critical section.  Fixup the RT writer-boosting
implementation to provide the same guarantee.

Also, while we're here, update the usage of ACCESS_ONCE() to use
READ_ONCE().

Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation")
Cc: stable...@vger.kernel.org
Signed-off-by: Julia Cartwright 
---
Found during code inspection of the RT seqlock implementation.

   Julia

 include/linux/seqlock.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index a59751276b94..597ce5a9e013 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -453,7 +453,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
unsigned ret;
 
 repeat:
-   ret = ACCESS_ONCE(sl->seqcount.sequence);
+   ret = READ_ONCE(sl->seqcount.sequence);
if (unlikely(ret & 1)) {
/*
 * Take the lock and let the writer proceed (i.e. evtl
@@ -462,6 +462,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
spin_unlock_wait(&sl->lock);
goto repeat;
}
+   smp_rmb();
return ret;
 }
 #endif
-- 
2.16.1



Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller

2018-04-03 Thread Julia Cartwright
Hello-

On Wed, Mar 14, 2018 at 04:14:34PM +0530, nagasureshkumarre...@gmail.com wrote:
> From: Naga Sureshkumar Relli 

I'm pleased to see this patchset revived and resubmitted!

It would be easier to follow if you constructed your two patchsets with
git format-patch --thread.

Or, merge them into a single patchset, especially considering the
dependency between patches.

Some code review comments below:

> Add driver for arm pl353 static memory controller. This controller is
> used in xilinx zynq soc for interfacing the nand and nor/sram memory
> devices.
> 
> Signed-off-by: Naga Sureshkumar Relli 
> ---
> Changes in v8:
> - None
> Changes in v7:
> - Corrected the kconfig to use tristate selection
> - Corrected the GPL licence ident 
> - Added boundary checks for nand timing parameters 
> Changes in v6:
> - Fixed checkpatch.pl reported warnings
> Changes in v5:
> - Added pl353_smc_get_clkrate function, made pl353_smc_set_cycles as public
>   API
> - Removed nand timing parameter initialization and moved it to nand driver  
> Changes in v4:
> - Modified driver to support multiple instances
> - Used sleep instaed of busywait for delay
> Changes in v3:
> - None
> Changes in v2:
> - Since now the timing parameters are in nano seconds, added logic to convert
>   them to the cycles
> ---
>  drivers/memory/Kconfig  |   7 +
>  drivers/memory/Makefile |   1 +
>  drivers/memory/pl353-smc.c  | 548 
> 
>  include/linux/platform_data/pl353-smc.h |  34 ++
>  4 files changed, 590 insertions(+)
>  create mode 100644 drivers/memory/pl353-smc.c
>  create mode 100644 include/linux/platform_data/pl353-smc.h
> 
> diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
> index 19a0e83..d70d6db 100644
> --- a/drivers/memory/Kconfig
> +++ b/drivers/memory/Kconfig
> @@ -152,6 +152,13 @@ config DA8XX_DDRCTL
> This driver is for the DDR2/mDDR Memory Controller present on
> Texas Instruments da8xx SoCs. It's used to tweak various memory
> controller configuration options.
> +config PL35X_SMC
> + bool "ARM PL35X Static Memory Controller(SMC) driver"

Is there any reason why this can't be tristate?

[..]
> +++ b/drivers/memory/pl353-smc.c
[..]
> +/**
> + * struct pl353_smc_data - Private smc driver structure
> + * @devclk:  Pointer to the peripheral clock
> + * @aperclk: Pointer to the APER clock
> + */
> +struct pl353_smc_data {
> + struct clk  *memclk;
> + struct clk  *aclk;
> +};
> +
> +/* SMC virtual register base */
> +static void __iomem *pl353_smc_base;

While it's true that the Zynq chips only have a single instance of this
IP, there is no real reason why an SoC can't instantiate more than one.
I'm a bit uncomfortable with the fact that this has been designed out.

> +
> +/**
> + * pl353_smc_set_buswidth - Set memory buswidth
> + * @bw:  Memory buswidth (8 | 16)
> + * Return: 0 on success or negative errno.
> + */
> +int pl353_smc_set_buswidth(unsigned int bw)
> +{
> +
> + if (bw != PL353_SMC_MEM_WIDTH_8  && bw != PL353_SMC_MEM_WIDTH_16)
> + return -EINVAL;
> +
> + writel(bw, pl353_smc_base + PL353_SMC_SET_OPMODE_OFFS);

There should be no reason not to use the _relaxed() accessor variants.

> + writel(PL353_SMC_DC_UPT_NAND_REGS, pl353_smc_base +
> +PL353_SMC_DIRECT_CMD_OFFS);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pl353_smc_set_buswidth);
> +
> +/**
> + * pl353_smc_set_cycles - Set memory timing parameters
> + * @t0:  t_rcread cycle time
> + * @t1:  t_wcwrite cycle time
> + * @t2:  t_rea/t_ceoeoutput enable assertion delay
> + * @t3:  t_wpwrite enable deassertion delay
> + * @t4:  t_clr/t_pc  page cycle time
> + * @t5:  t_ar/t_ta   ID read time/turnaround time
> + * @t6:  t_rrbusy to RE timing
> + *
> + * Sets NAND chip specific timing parameters.
> + */
> +static void pl353_smc_set_cycles(u32 t0, u32 t1, u32 t2, u32 t3, u32
> +   t4, u32 t5, u32 t6)
> +{
> + t0 &= PL353_SMC_SET_CYCLES_T0_MASK;
> + t1 = (t1 & PL353_SMC_SET_CYCLES_T1_MASK) <<
> + PL353_SMC_SET_CYCLES_T1_SHIFT;
> + t2 = (t2 & PL353_SMC_SET_CYCLES_T2_MASK) <<
> + PL353_SMC_SET_CYCLES_T2_SHIFT;
> + t3 = (t3 & PL353_SMC_SET_CYCLES_T3_MASK) <<
> + PL353_SMC_SET_CYCLES_T3_SHIFT;
> + t4 = (t4 & PL353_SMC_SET_CYCLES_T4_MASK) <<
> + PL353_SMC_SET_CYCLES_T4_SHIFT;
> + t5 = (t5 & PL353_SMC_SET_CYCLES_T5_MASK) <<
> + PL353_SMC_SET_CYCLES_T5_SHIFT;
> + t6 = (t6 & PL353_SMC_SET_CYCLES_T6_MASK) <<
> + PL353_SMC_SET_CYCLES_T6_SHIFT;
> +
> + t0 |= t1 | t2 | t3 | t4 | t5 | t6;
> +
> + writel(t0, pl353_smc_base + PL353_SMC_SET_CYCLES_OFFS);
> + writel(PL353_SMC_DC_UPT_NAND_REGS, pl353_smc_base +
> +   

Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING

2018-03-23 Thread Julia Cartwright
On Fri, Mar 23, 2018 at 01:21:31PM -0400, joe.ko...@concurrent-rt.com wrote:
> On Fri, Mar 23, 2018 at 11:59:21AM -0500, Julia Cartwright wrote:
> > On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote:
> > > I see the below kernel splat in 4.9-rt when I run a test program that
> > > continually changes the affinity of some set of running pids:
> > >
> > >do not call blocking ops when !TASK_RUNNING; state=2 set at ...
> > >   ...
> > >   stop_one_cpu+0x60/0x80
> > >   migrate_enable+0x21f/0x3e0
> > >   rt_spin_unlock+0x2f/0x40
> > >   prepare_to_wait+0x5c/0x80
> > >   ...
> >
> > This is clearly a problem.
> >
> > > The reason is that spin_unlock, write_unlock, and read_unlock call
> > > migrate_enable, and since 4.4-rt, migrate_enable will sleep if it 
> > > discovers
> > > that a migration is in order.  But sleeping in the unlock services is not
> > > expected by most kernel developers,
> >
> > I don't buy this, see below:
> >
> > > and where that counts most is in code sequences like the following:
> > >
> > >   set_current_state(TASK_UNINTERRUPIBLE);
> > >   spin_unlock(&s);
> > >   schedule();
> >
> > The analog in mainline is CONFIG_PREEMPT and the implicit
> > preempt_enable() in spin_unlock().  In this configuration, a kernel
> > developer should _absolutely_ expect their task to be suspended (and
> > potentially migrated), _regardless of the task state_ if there is a
> > preemption event on the CPU on which this task is executing.
> >
> > Similarly, on RT, there is nothing _conceptually_ wrong on RT with
> > migrating on migrate_enable(), regardless of task state, if there is a
> > pending migration event.
>
> My understanding is, in standard Linux and in rt, setting
> task state to anything other than TASK_RUNNING in of itself
> blocks preemption.

I'm assuming you're referring to the window of time between a task
setting its state to !TASK_RUNNING and schedule()?  The task remains
preemptible in this window.

> A preemption is not really needed here as it is expected that there is
> a schedule() written in that will shortly be executed.
>
> And if a 'involuntary schedule' (ie, preemption) were allowed to occur
> between the task state set and the schedule(), that would change the
> task state back to TASK_RUNNING;

This isn't the case.  A preempted task preserves its state.

   Julia


Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING

2018-03-23 Thread Julia Cartwright
Hey Joe-

Thanks for the writeup.

On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote:
> I see the below kernel splat in 4.9-rt when I run a test program that
> continually changes the affinity of some set of running pids:
> 
>do not call blocking ops when !TASK_RUNNING; state=2 set at ...
>   ...
>   stop_one_cpu+0x60/0x80
>   migrate_enable+0x21f/0x3e0
>   rt_spin_unlock+0x2f/0x40
>   prepare_to_wait+0x5c/0x80
>   ...

This is clearly a problem.

> The reason is that spin_unlock, write_unlock, and read_unlock call
> migrate_enable, and since 4.4-rt, migrate_enable will sleep if it discovers
> that a migration is in order.  But sleeping in the unlock services is not
> expected by most kernel developers,

I don't buy this, see below:

> and where that counts most is in code sequences like the following:
>
>   set_current_state(TASK_UNINTERRUPIBLE);
>   spin_unlock(&s);
>   schedule();

The analog in mainline is CONFIG_PREEMPT and the implicit
preempt_enable() in spin_unlock().  In this configuration, a kernel
developer should _absolutely_ expect their task to be suspended (and
potentially migrated), _regardless of the task state_ if there is a
preemption event on the CPU on which this task is executing.

Similarly, on RT, there is nothing _conceptually_ wrong on RT with
migrating on migrate_enable(), regardless of task state, if there is a
pending migration event.

It's clear, however, that the mechanism used here is broken ...

   Julia


Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core

2018-03-06 Thread Julia Cartwright
On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
> This commit adds driver implementation for PECI bus into linux
> driver framework.
> 
> Signed-off-by: Jae Hyun Yoo 
> ---
[..]
> +static int peci_locked_xfer(struct peci_adapter *adapter,
> + struct peci_xfer_msg *msg,
> + bool do_retry,
> + bool has_aw_fcs)

_locked generally means that this function is invoked with some critical
lock held, what lock does the caller need to acquire before invoking
this function?

> +{
> + ktime_t start, end;
> + s64 elapsed_ms;
> + int rc = 0;
> +
> + if (!adapter->xfer) {

Is this really an optional feature of an adapter?  If this is not
optional, then this check should be in place when the adapter is
registered, not here.  (And it should WARN_ON(), because it's a driver
developer error).

> + dev_dbg(&adapter->dev, "PECI level transfers not supported\n");
> + return -ENODEV;
> + }
> +
> + if (in_atomic() || irqs_disabled()) {

As Andrew mentioned, this is broken.

You don't even need a might_sleep().  The locking functions you use here
will already include a might_sleep() w/ CONFIG_DEBUG_ATOMIC_SLEEP.

> + rt_mutex_trylock(&adapter->bus_lock);
> + if (!rc)
> + return -EAGAIN; /* PECI activity is ongoing */
> + } else {
> + rt_mutex_lock(&adapter->bus_lock);
> + }
> +
> + if (do_retry)
> + start = ktime_get();
> +
> + do {
> + rc = adapter->xfer(adapter, msg);
> +
> + if (!do_retry)
> + break;
> +
> + /* Per the PECI spec, need to retry commands that return 0x8x */
> + if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) ==
> +   DEV_PECI_CC_TIMEOUT)))
> + break;

This is pretty difficult to parse.  Can you split it into two different
conditions?

> +
> + /* Set the retry bit to indicate a retry attempt */
> + msg->tx_buf[1] |= DEV_PECI_RETRY_BIT;

Are you sure this bit is to be set in the _second_ byte of tx_buf?

> +
> + /* Recalculate the AW FCS if it has one */
> + if (has_aw_fcs)
> + msg->tx_buf[msg->tx_len - 1] = 0x80 ^
> + peci_aw_fcs((u8 *)msg,
> + 2 + msg->tx_len);
> +
> + /* Retry for at least 250ms before returning an error */
> + end = ktime_get();
> + elapsed_ms = ktime_to_ms(ktime_sub(end, start));
> + if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
> + dev_dbg(&adapter->dev, "Timeout retrying xfer!\n");
> + break;
> + }
> + } while (true);
> +
> + rt_mutex_unlock(&adapter->bus_lock);
> +
> + return rc;
> +}
> +
> +static int peci_xfer(struct peci_adapter *adapter, struct peci_xfer_msg *msg)
> +{
> + return peci_locked_xfer(adapter, msg, false, false);
> +}
> +
> +static int peci_xfer_with_retries(struct peci_adapter *adapter,
> +   struct peci_xfer_msg *msg,
> +   bool has_aw_fcs)
> +{
> + return peci_locked_xfer(adapter, msg, true, has_aw_fcs);
> +}
> +
> +static int peci_scan_cmd_mask(struct peci_adapter *adapter)
> +{
> + struct peci_xfer_msg msg;
> + u32 dib;
> + int rc = 0;
> +
> + /* Update command mask just once */
> + if (adapter->cmd_mask & BIT(PECI_CMD_PING))
> + return 0;
> +
> + msg.addr  = PECI_BASE_ADDR;
> + msg.tx_len= GET_DIB_WR_LEN;
> + msg.rx_len= GET_DIB_RD_LEN;
> + msg.tx_buf[0] = GET_DIB_PECI_CMD;
> +
> + rc = peci_xfer(adapter, &msg);
> + if (rc < 0) {
> + dev_dbg(&adapter->dev, "PECI xfer error, rc : %d\n", rc);
> + return rc;
> + }
> +
> + dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
> +   (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
> +
> + /* Check special case for Get DIB command */
> + if (dib == 0x00) {
> + dev_dbg(&adapter->dev, "DIB read as 0x00\n");
> + return -1;
> + }
> +
> + if (!rc) {

You should change this to:

if (rc) {
dev_dbg(&adapter->dev, "Error reading DIB, rc : %d\n", rc);
return rc;
}

And then leave the happy path below unindented.

> + /**
> +  * setting up the supporting commands based on minor rev#
> +  * see PECI Spec Table 3-1
> +  */
> + dib = (dib >> 8) & 0xF;
> +
> + if (dib >= 0x1) {
> + adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG);
> + adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG);
> + }
> +
> + if (dib >= 0x2)
> + adapter->cmd_mask |= BI

Re: C tricks for efficient stack zeroing

2018-03-06 Thread Julia Cartwright
On Fri, Mar 02, 2018 at 08:50:17PM +0100, Jason A. Donenfeld wrote:
[..]
> What would be really nice would be to somehow keep track of the
> maximum stack depth, and just before the function returns, clear from
> the maximum depth to its stack base, all in one single call. This
> would not only make the code faster and less brittle, but it would
> also clean up some algorithms quite a bit.
> 
> Ideally this would take the form of a gcc attribute on the function,
> but I was unable to find anything of that nature. I started looking
> for little C tricks for this, and came up dry too. I realize I could
> probably just take the current stack address and zero out until _the
> very end_ but that seems to overshoot and would probably be bad for
> performance. The best I've been able to do come up with are some
> x86-specific macros, but that approach seems a bit underwhelming.
> Other approaches include adding a new attribute via the gcc plugin
> system, which could make this kind of thing more complete [cc'ing
> pipacs in case he's thought about that before].

Can objtool support a static stack usage analysis?

I'm wondering if it's possible to place these sensitive functions in a
special linker section, like .text.stackzero.; objtool could
collect static call data (as it already does) and stack usage, spitting
out a symbol definition stackzero__max_depth, which you could then
use to bound your zeroing.

Obviously this is a static analysis, with the limitations therein.

   Julia


[ANNOUNCE] 4.9.84-rt62

2018-02-27 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.84-rt62 stable release.

This release is just an update to the new stable 4.9.84 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: be42cd02846a611af533103a3f4b6a7d8c592f49

Or to build 4.9.84-rt62 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.84.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.84-rt62.patch.xz

Enjoy!
Julia



Re: [PATCH 1/2] kernel/sofirq: consolidate common code in __tasklet_schedule() + _hi_

2018-02-15 Thread Julia Cartwright
On Thu, Feb 15, 2018 at 03:07:07PM -0500, Steven Rostedt wrote:
> On Thu, 15 Feb 2018 18:20:41 +0100
> Sebastian Andrzej Siewior  wrote:
> 
> > -void __tasklet_schedule(struct tasklet_struct *t)
> > +static void __tasklet_schedule_common(struct tasklet_struct *t,
> > + struct tasklet_head *head,
> > + unsigned int softirq_nr)
> >  {
> > unsigned long flags;
> >  
> > local_irq_save(flags);
> 
> If you look at the original patch, it did not move local_irq_save()
> into the common function.
> 
> > t->next = NULL;
> > -   *__this_cpu_read(tasklet_vec.tail) = t;
> > -   __this_cpu_write(tasklet_vec.tail, &(t->next));
> > -   raise_softirq_irqoff(TASKLET_SOFTIRQ);
> > +   *head->tail = t;
> > +   head->tail = &(t->next);
> > +   raise_softirq_irqoff(softirq_nr);
> > local_irq_restore(flags);
> >  }
> > +
> > +void __tasklet_schedule(struct tasklet_struct *t)
> > +{
> > +   __tasklet_schedule_common(t, this_cpu_ptr(&tasklet_vec),
> 
> What can happen is, we reference (tasklet_vec) on one CPU, get
> preempted (running in ksoftirqd), scheduled on another CPU, then when
> inside the common code, we are executing on a different CPU than the
> tasklet is for. The rasise_softirq() is happening on the wrong CPU.
> 
> The local_irq_save() can't be moved to the common function. It must be
> done by each individual function.

Well, it can be, but the percpu access needs to go with it; so an
alternative solution would be the below.

I'm also wondering whether the t->next = NULL assignment could be lifted
out from irqs-disabled region.

   Julia

diff --git a/kernel/softirq.c b/kernel/softirq.c
index fa7ed89a9fcf..177de3640c78 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -461,12 +461,14 @@ static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec);
 static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec);
 
 static void __tasklet_schedule_common(struct tasklet_struct *t,
- struct tasklet_head *head,
+ struct tasklet_head __percpu *headp,
  unsigned int softirq_nr)
 {
+   struct tasklet_head *head;
unsigned long flags;
 
local_irq_save(flags);
+   head = this_cpu_ptr(headp);
t->next = NULL;
*head->tail = t;
head->tail = &(t->next);
@@ -476,14 +478,14 @@ static void __tasklet_schedule_common(struct 
tasklet_struct *t,
 
 void __tasklet_schedule(struct tasklet_struct *t)
 {
-   __tasklet_schedule_common(t, this_cpu_ptr(&tasklet_vec),
+   __tasklet_schedule_common(t, &tasklet_vec,
  TASKLET_SOFTIRQ);
 }
 EXPORT_SYMBOL(__tasklet_schedule);
 
 void __tasklet_hi_schedule(struct tasklet_struct *t)
 {
-   __tasklet_schedule_common(t, this_cpu_ptr(&tasklet_hi_vec),
+   __tasklet_schedule_common(t, &tasklet_hi_vec,
  HI_SOFTIRQ);
 }
 EXPORT_SYMBOL(__tasklet_hi_schedule);


[PATCH 4.9.y] ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs

2018-02-13 Thread Julia Cartwright
From: Xiaolei Li 

This is a conceptual cherry-pick of commit
d8db5b1ca9d4c57e49893d0f78e6d5ce81450cc8 upstream.

The inode is not locked in init_xattrs when creating a new inode.

Without this patch, there will occurs assert when booting or creating
a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled.

Log likes:

UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156)
CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S 4.12.0-rc1-207440-g1e70b02 #2
Hardware name: MediaTek MT2712 evaluation board (DT)
Call trace:
[] dump_backtrace+0x0/0x238
[] show_stack+0x14/0x20
[] dump_stack+0x9c/0xc0
[] ubifs_xattr_set+0x374/0x5e0
[] init_xattrs+0x5c/0xb8
[] security_inode_init_security+0x110/0x190
[] ubifs_init_security+0x30/0x68
[] ubifs_mkdir+0x100/0x200
[] vfs_mkdir+0x11c/0x1b8
[] SyS_mkdirat+0x74/0xd0
[] __sys_trace_return+0x0/0x4

Signed-off-by: Xiaolei Li 
Signed-off-by: Richard Weinberger 
Cc: sta...@vger.kernel.org
(julia: massaged to apply to 4.9.y, which doesn't contain fscrypto support)
Signed-off-by: Julia Cartwright 
---
Hey all-

We reproduced the issue fixed upstream by Xiaolei Li's commit in 4.9.y,
with the very similar backtrace:

   UBIFS assert failed in __ubifs_setxattr at 282 (pid 1362)
   CPU: 1 PID: 1362 Comm: sed Not tainted 4.9.47-rt37 #1
   Backtrace:
  (dump_backtrace) from (show_stack+0x20/0x24)
  (show_stack) from (dump_stack+0x80/0xa0)
  (dump_stack) from (__ubifs_setxattr+0x84/0x634)
  (__ubifs_setxattr) from (init_xattrs+0x70/0xac)
  (init_xattrs) from (security_inode_init_security+0x100/0x144)
  (security_inode_init_security) from (ubifs_init_security+0x38/0x6c)
  (ubifs_init_security) from (ubifs_create+0xe8/0x1fc)
  (ubifs_create) from (path_openat+0x97c/0x1090)
  (path_openat) from (do_filp_open+0x48/0x94)
  (do_filp_open) from (do_sys_open+0x134/0x1d8)
  (do_sys_open) from (SyS_open+0x30/0x34)
  (SyS_open) from (ret_fast_syscall+0x0/0x3c)

Please consider applying this to 4.9.y at the very least, it may apply
further back as well.

Thanks!
   Julia

 fs/ubifs/xattr.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
index d9f9615bfd71..3979d767a0cb 100644
--- a/fs/ubifs/xattr.c
+++ b/fs/ubifs/xattr.c
@@ -270,7 +270,8 @@ static struct inode *iget_xattr(struct ubifs_info *c, ino_t 
inum)
 }
 
 static int __ubifs_setxattr(struct inode *host, const char *name,
-   const void *value, size_t size, int flags)
+   const void *value, size_t size, int flags,
+   bool check_lock)
 {
struct inode *inode;
struct ubifs_info *c = host->i_sb->s_fs_info;
@@ -279,7 +280,8 @@ static int __ubifs_setxattr(struct inode *host, const char 
*name,
union ubifs_key key;
int err;
 
-   ubifs_assert(inode_is_locked(host));
+   if (check_lock)
+   ubifs_assert(inode_is_locked(host));
 
if (size > UBIFS_MAX_INO_DATA)
return -ERANGE;
@@ -548,7 +550,8 @@ static int init_xattrs(struct inode *inode, const struct 
xattr *xattr_array,
}
strcpy(name, XATTR_SECURITY_PREFIX);
strcpy(name + XATTR_SECURITY_PREFIX_LEN, xattr->name);
-   err = __ubifs_setxattr(inode, name, xattr->value, 
xattr->value_len, 0);
+   err = __ubifs_setxattr(inode, name, xattr->value,
+  xattr->value_len, 0, false);
kfree(name);
if (err < 0)
break;
@@ -594,7 +597,8 @@ static int ubifs_xattr_set(const struct xattr_handler 
*handler,
name = xattr_full_name(handler, name);
 
if (value)
-   return __ubifs_setxattr(inode, name, value, size, flags);
+   return __ubifs_setxattr(inode, name, value, size, flags,
+   true);
else
return __ubifs_removexattr(inode, name);
 }
-- 
2.16.1



Re: [patch v18 1/4] drivers: jtag: Add JTAG core driver

2018-01-29 Thread Julia Cartwright
On Mon, Jan 29, 2018 at 04:31:42PM +0200, Oleksandr Shamray wrote:
> Initial patch for JTAG driver
> JTAG class driver provide infrastructure to support hardware/software
> JTAG platform drivers. It provide user layer API interface for flashing
> and debugging external devices which equipped with JTAG interface
> using standard transactions.
> 
> Driver exposes set of IOCTL to user space for:
> - XFER:
> - SIR (Scan Instruction Register, IEEE 1149.1 Data Register scan);
> - SDR (Scan Data Register, IEEE 1149.1 Instruction Register scan);
> - RUNTEST (Forces the IEEE 1149.1 bus to a run state for a specified
>   number of clocks).
> - SIOCFREQ/GIOCFREQ for setting and reading JTAG frequency.
> 
> Driver core provides set of internal APIs for allocation and
> registration:
> - jtag_register;
> - jtag_unregister;
> - jtag_alloc;
> - jtag_free;
> 
> Platform driver on registration with jtag-core creates the next
> entry in dev folder:
> /dev/jtagX
> 
> Signed-off-by: Oleksandr Shamray 
> Signed-off-by: Jiri Pirko 
> Acked-by: Philippe Ombredanne 
[..]
> diff --git a/drivers/jtag/jtag.c b/drivers/jtag/jtag.c
[..]
> +struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops)
> +{
> + struct jtag *jtag;
> +
> + jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL);
> + if (!jtag)
> + return NULL;
> +
> + if (!ops)
> + return NULL;
> +
> + if (!ops->idle || !ops->mode_set || !ops->status_get || !ops->xfer)
> + return NULL;

Did you think through this?

You leak 'jtag' here and above.

Perform all the ops checks prior to the allocation.

   Julia


Re: [patch v17 1/4] drivers: jtag: Add JTAG core driver

2018-01-17 Thread Julia Cartwright
Hello Oleksandr-

On Tue, Jan 16, 2018 at 09:18:56AM +0200, Oleksandr Shamray wrote:
[..]
> v16->v17
> Comments pointed by Julia Cartwright 

More review feedback below:

[..]
> +++ b/drivers/jtag/jtag.c
[..]
> +static long jtag_ioctl(struct file *file, unsigned int cmd, unsigned long 
> arg)
> +{
> + struct jtag *jtag = file->private_data;
> + struct jtag_run_test_idle idle;
> + struct jtag_xfer xfer;
> + u8 *xfer_data;
> + u32 data_size;
> + u32 value;
> + int err;
> +
> + if (!arg)
> + return -EINVAL;
> +
> + switch (cmd) {
> + case JTAG_GIOCFREQ:
> + if (!jtag->ops->freq_get)
> + err = -EOPNOTSUPP;

Did you mean:

return -EOPNOTSUPP;

?

> +
> + err = jtag->ops->freq_get(jtag, &value);

Otherwise you're check was worthless, you'll call NULL here.

Also, w.r.t. the set of ops which are required to be implemented: this
isn't the right place to do the check.

Instead, do it in jtag_alloc():

struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops)
{
struct jtag *jtag;

if (!ops->freq_get || !ops->xfer || ...) /* fixup condition */
return NULL;

jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL);
if (!jtag)
return NULL;

jtag->ops = ops;
return jtag;
}
EXPORT_SYMBOL_GPL(jtag_alloc);

[..]
> + case JTAG_IOCXFER:
[..]
> + data_size = DIV_ROUND_UP(xfer.length, BITS_PER_BYTE);
> + xfer_data = memdup_user(u64_to_user_ptr(xfer.tdio), data_size);
> +
> + if (!xfer_data)

memdup_user() doesn't return NULL on error.  You need to check for
IS_ERR(xfer_data).

> + return -EFAULT;
> +
> + err = jtag->ops->xfer(jtag, &xfer, xfer_data);
> + if (err) {
> + kfree(xfer_data);
> + return -EFAULT;
> + }
> +
> + err = copy_to_user(u64_to_user_ptr(xfer.tdio),
> +(void *)(xfer_data), data_size);
> +
> + if (err) {
> + kfree(xfer_data);
> + return -EFAULT;
> + }
> +
> + kfree(xfer_data);

Move the kfree() above the if (err).

> + if (copy_to_user((void *)arg, &xfer, sizeof(struct jtag_xfer)))
> + return -EFAULT;
> + break;
> +
> + case JTAG_GIOCSTATUS:
> + if (!jtag->ops->status_get)
> + return -EOPNOTSUPP;
> +
> + err = jtag->ops->status_get(jtag, &value);
> + if (err)
> + break;
> +
> + err = put_user(value, (__u32 *)arg);
> + if (err)
> + err = -EFAULT;

put_user() returns -EFAULT on failure, so this shouldn't be necessary.

[..]
> --- /dev/null
> +++ b/include/uapi/linux/jtag.h
[..]
> +/**
> + * struct jtag_xfer - jtag xfer:
> + *
> + * @type: transfer type
> + * @direction: xfer direction
> + * @length: xfer bits len
> + * @tdio : xfer data array
> + * @endir: xfer end state
> + *
> + * Structure represents interface to JTAG device for jtag sdr xfer
> + * execution.
> + */
> +struct jtag_xfer {
> + __u8type;
> + __u8direction;
> + __u8endstate;

Just to be as unambiguous as possible, considering this is ABI, I would
suggest explicitly putting a padding byte here.

> + __u32   length;
> + __u64   tdio;
> +};

Thanks,
   Julia


Re: [ANNOUNCE] 4.9.76-rt61

2018-01-16 Thread Julia Cartwright
On Tue, Jan 16, 2018 at 11:11:32PM +, Bernhard Landauer wrote:
> Ah. Working now after adding in validpgpkeys array in the PKGBUILD.

This was the first v4.9-rt release since Steven handed over his
maintainership , so that would explain why it would be signed by a
different key (although, my key has been signed by Steven's).

Thanks,
   Julia


[ANNOUNCE] 4.9.76-rt61

2018-01-16 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.76-rt61 stable release.

This release is just an update to the new stable 4.9.76 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: 90816cc1d4a1d23efe37b74866c6174dd5eab6b5

Or to build 4.9.76-rt61 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.76.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.76-rt61.patch.xz

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: 90816cc1d4a1d23efe37b74866c6174dd5eab6b5

Or to build 4.9.76-rt61 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.76.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.76-rt61.patch.xz

Enjoy,
   Julia


Re: [patch v16 1/4] drivers: jtag: Add JTAG core driver

2018-01-15 Thread Julia Cartwright
Hello Oleksandr-

I have a few comments below.

On Fri, Jan 12, 2018 at 07:08:26PM +0200, Oleksandr Shamray wrote:
> Initial patch for JTAG driver
> JTAG class driver provide infrastructure to support hardware/software
> JTAG platform drivers. It provide user layer API interface for flashing
> and debugging external devices which equipped with JTAG interface
> using standard transactions.
> 
> Driver exposes set of IOCTL to user space for:
> - XFER:
> - SIR (Scan Instruction Register, IEEE 1149.1 Data Register scan);
> - SDR (Scan Data Register, IEEE 1149.1 Instruction Register scan);
> - RUNTEST (Forces the IEEE 1149.1 bus to a run state for a specified
>   number of clocks).
> - SIOCFREQ/GIOCFREQ for setting and reading JTAG frequency.
> 
> Driver core provides set of internal APIs for allocation and
> registration:
> - jtag_register;
> - jtag_unregister;
> - jtag_alloc;
> - jtag_free;
> 
> Platform driver on registration with jtag-core creates the next
> entry in dev folder:
> /dev/jtagX
> 
> Signed-off-by: Oleksandr Shamray 
> Signed-off-by: Jiri Pirko 
> Acked-by: Philippe Ombredanne 

[..]

> +static long jtag_ioctl(struct file *file, unsigned int cmd, unsigned long 
> arg)
> +{
> + struct jtag *jtag = file->private_data;
> + struct jtag_run_test_idle idle;
> + struct jtag_xfer xfer;
> + u8 *xfer_data;
> + u32 data_size;
> + u32 value;
> + int err;
> +
> + if (!arg)
> + return -EINVAL;
> +
> + switch (cmd) {
> + case JTAG_GIOCFREQ:
> + if (!jtag->ops->freq_get)
> + err = -EOPNOTSUPP;
> +
> + err = jtag->ops->freq_get(jtag, &value);
> + if (err)
> + break;
> +
> + if (put_user(value, (__u32 *)arg))
> + err = -EFAULT;
> + break;
> +
> + case JTAG_SIOCFREQ:
> + if (!jtag->ops->freq_set)
> + return -EOPNOTSUPP;
> +
> + if (get_user(value, (__u32 *)arg))
> + return -EFAULT;
> + if (value == 0)
> + return -EINVAL;
> +
> + err = jtag->ops->freq_set(jtag, value);
> + break;
> +
> + case JTAG_IOCRUNTEST:
> + if (!jtag->ops->idle)
> + return -EOPNOTSUPP;
> +
> + if (copy_from_user(&idle, (void *)arg,
> +sizeof(struct jtag_run_test_idle)))
> + return -EFAULT;
> +
> + if (idle.endstate > JTAG_STATE_PAUSEDR)
> + return -EINVAL;
> +
> + err = jtag->ops->idle(jtag, &idle);
> + break;
> +
> + case JTAG_IOCXFER:
> + if (!jtag->ops->xfer)

Are all ops optional?  That seems bizarre.  I would have expected at
least one callback to be required.

[..]
> +static int jtag_open(struct inode *inode, struct file *file)
> +{
> + struct jtag *jtag = container_of(file->private_data, struct jtag,
> +  miscdev);
> +
> + if (mutex_lock_interruptible(&jtag->open_lock))
> + return -ERESTARTSYS;
> +
> + if (jtag->opened) {
> + mutex_unlock(&jtag->open_lock);
> + return -EBUSY;
> + }
> +
> + nonseekable_open(inode, file);
> + file->private_data = jtag;

These two can be moved out of the lock.

> + jtag->opened = true;
> + mutex_unlock(&jtag->open_lock);
> + return 0;
> +}
> +
> +static int jtag_release(struct inode *inode, struct file *file)
> +{
> + struct jtag *jtag = file->private_data;
> +
> + mutex_lock(&jtag->open_lock);
> + jtag->opened = false;
> + mutex_unlock(&jtag->open_lock);
> + return 0;
> +}
> +
> +static const struct file_operations jtag_fops = {
> + .owner  = THIS_MODULE,
> + .open   = jtag_open,
> + .release= jtag_release,
> + .llseek = noop_llseek,
> + .unlocked_ioctl = jtag_ioctl,
> +};
> +
> +struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops)
> +{
> + struct jtag *jtag;
> +
> + jtag = kzalloc(sizeof(*jtag), GFP_KERNEL);

Did you mean to allocate: sizeof(*jtag) + priv_size?

> + if (!jtag)
> + return NULL;
> +
> + jtag->ops = ops;
> + return jtag;
> +}
> +EXPORT_SYMBOL_GPL(jtag_alloc);
> +
> +void jtag_free(struct jtag *jtag)
> +{
> + kfree(jtag);
> +}
> +EXPORT_SYMBOL_GPL(jtag_free);
> +
> +int jtag_register(struct jtag *jtag)
> +{
> + char *name;
> + int err;
> + int id;
> +
> + id = ida_simple_get(&jtag_ida, 0, 0, GFP_KERNEL);
> + if (id < 0)
> + return id;
> +
> + jtag->id = id;
> +
> + name = kzalloc(MAX_JTAG_NAME_LEN, GFP_KERNEL);
> + if (!name) {
> + err = -ENOMEM;
> + goto err_jtag_alloc;
> + }
> +
> + err = snprintf(name, MAX_JTAG_NAME_LEN, "jtag%d", id);
> + if (err < 0)
> + goto err_jtag_name;
> +
> + mutex_init(&jtag->ope

Re: [PATCH] futex: Avoid violating the 10th rule of futex

2018-01-08 Thread Julia Cartwright
On Fri, Dec 08, 2017 at 01:49:39PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote:
[..]
> Assuming nothing bad happens; find below the patch with a Changelog
> attached.
> 
> ---
> Subject: futex: Avoid violating the 10th rule of futex
> From: Peter Zijlstra 
> Date: Thu Dec  7 16:54:23 CET 2017
> 
> Julia reported futex state corruption in the following scenario:
> 
>waiter  waker  
>   stealer (prio > waiter)
> 
>futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
>  timeout=[N ms])
>   futex_wait_requeue_pi()
>  futex_wait_queue_me()
> freezable_schedule()
> 
>futex(LOCK_PI, uaddr2)
>futex(CMP_REQUEUE_PI, uaddr,
>  uaddr2, 1, 0)
>   /* requeues waiter to uaddr2 */
>futex(UNLOCK_PI, uaddr2)
>  wake_futex_pi()
> 
> cmp_futex_value_locked(uaddr2, waiter)
> wake_up_q()
>
> clears sleeper->task>
>   
>  futex(LOCK_PI, uaddr2)
>   
> __rt_mutex_start_proxy_lock()
>   
>try_to_take_rt_mutex() /* steals lock */
>   
>   rt_mutex_set_owner(lock, stealer)
>   
> 
>  
>  rt_mutex_wait_proxy_lock()
> __rt_mutex_slowlock()
>try_to_take_rt_mutex() /* fails, lock held by stealer */
>if (timeout && !timeout->task)
>   return -ETIMEDOUT;
> fixup_owner()
>/* lock wasn't acquired, so,
>   fixup_pi_state_owner skipped */
> 
>return -ETIMEDOUT;
> 
>/* At this point, we've returned -ETIMEDOUT to userspace, but the
> * futex word shows waiter to be the owner, and the pi_mutex has
> * stealer as the owner */
> 
>futex_lock(LOCK_PI, uaddr2)
>  -> bails with EDEADLK, futex word says we're owner.
> 
> And suggested that what commit:
> 
>   73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
> 
> removes from fixup_owner() looks to be just what is needed. And indeed
> it is -- I completely missed that requeue_pi could also result in this
> case. So we need to restore that, except that subsequent patches, like
> commit:
> 
>   16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")
> 
> changed all the locking rules. Even without that, the sequence:
> 
> -   if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
> -   locked = 1;
> -   goto out;
> -   }
> 
> -   raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
> -   owner = rt_mutex_owner(&q->pi_state->pi_mutex);
> -   if (!owner)
> -   owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
> -   raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
> -   ret = fixup_pi_state_owner(uaddr, q, owner);
> 
> already suggests there were races; otherwise we'd never have to look
> at next_owner.
> 
> So instead of doing 3 consecutive wait_lock sections with who knows
> what races, we do it all in a single section. Additionally, the usage
> of pi_state->owner in fixup_owner() was only safe because only the
> rt_mutex owner would modify it, which this additional case wrecks.
> 
> Luckily the values can only change away and not to the value we're
> testing, this means we can do a speculative test and double check once
> we have the wait_lock.
> 
> Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
> Reported-and-Tested-by: Julia Cartwright 
> Reported-and-Tested-by: Gratian Crisan 
> Signed-off-by: Peter Zijlstra (Intel) 

Hey Peter-

We've been running w/ this patch now without further regression.  I was
expecting to see this hit 4.15-rc* eventually, but haven't seen it land
anywhere.  Is this in the queue for 4.16, then?

Thanks!
   Julia


Re: [ANNOUNCE] 4.1.46-rt52

2017-12-11 Thread Julia Cartwright
On Mon, Dec 11, 2017 at 03:01:39PM -0600, Corey Minyard wrote:
> On 12/11/2017 02:20 PM, Julia Cartwright wrote:
> > On Mon, Dec 11, 2017 at 01:04:58PM -0600, Corey Minyard wrote:
[..]
> > > > 
> > > > Sebastian Andrzej Siewior (2):
> > > > PM / CPU: replace raw_notifier with atomic_notifier (fixup)
> > > This change breaks the compile of 4.1 on ARM (and other things using
> > > kernel/cpu_pm.c).
> > > 
> > > rcu_irq_enter_irqson() and friends doesn't exist in 4.1.  In fact,
> > > rcu_irq_enter() doesn't even exist.
> > > 
> > > Sorry I didn't get to testing this earlier, I've been focused on other 
> > > things.
> > Thanks!  My current builds apparently don't currently cover CPU_PM
> > builds, so I didn't see this :(.  Thanks for the report.
> > 
> > v4.1 does contain rcu_irq_enter(), and it looks like what we should be
> > using in 4.1 instead.
> 
> Ah, it does, I looked in the wrong place.
> 
> > I'm thinking the below should be okay; I'll be standing up a proper cpu
> > hotplugging test to give it some runtime.
> 
> The below should work.  You'll need an ARM64, ARM or MIPS system with
> the proper config settings to exercise this code, though.

Shouldn't be a problem, I've got an ARM64 board that I've been meaning
to integrate into my testing anyway.

Thanks,
   Julia


Re: [ANNOUNCE] 4.1.46-rt52

2017-12-11 Thread Julia Cartwright
On Mon, Dec 11, 2017 at 01:04:58PM -0600, Corey Minyard wrote:
> On 11/29/2017 05:13 PM, Julia Cartwright wrote:
> > Hello RT Folks!
> > 
> > I'm pleased to announce the 4.1.46-rt52 stable release.
> > 
> > You can get this release via the git tree at:
> > 
> >git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git
> > 
> >branch: v4.1-rt
> >Head SHA1: 6e737a91c1ce923d4e10db831e14cfef9a2459cc
> > 
> > Or to build 4.1.46-rt52 directly, the following patches should be applied:
> > 
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.kernel.org_pub_linux_kernel_v4.x_linux-2D4.1.tar.xz&d=DwICaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=cAXq_8W9Othb2h8ZcWv8Vw&m=a5KTpX9A0-pHwmQkOOLr5_xeWPavI-lgN7Z0Goes2So&s=tGmSNpZKEKvhTpeVu_ktmD4I5PIXfIJhZ79DxwbCjtQ&e=
> > 
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.kernel.org_pub_linux_kernel_v4.x_patch-2D4.1.46.xz&d=DwICaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=cAXq_8W9Othb2h8ZcWv8Vw&m=a5KTpX9A0-pHwmQkOOLr5_xeWPavI-lgN7Z0Goes2So&s=hx9pHgRy7tWhUeeiJhPOH0T8qdDrhtZvWxOWuIBZZgs&e=
> > 
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.kernel.org_pub_linux_kernel_projects_rt_4.1_patch-2D4.1.46-2Drt52.patch.xz&d=DwICaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=cAXq_8W9Othb2h8ZcWv8Vw&m=a5KTpX9A0-pHwmQkOOLr5_xeWPavI-lgN7Z0Goes2So&s=Pm8ed_xvW1drpPxNkt035bj6ILvSImuN9vvgEAt8IiM&e=
> > 
> > 
> > You can also build from 4.1.46-rt51 by applying the incremental patch:
> > 
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.kernel.org_pub_linux_kernel_projects_rt_4.1_incr_patch-2D4.1.46-2Drt51-2Drt52.patch.xz&d=DwICaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=cAXq_8W9Othb2h8ZcWv8Vw&m=a5KTpX9A0-pHwmQkOOLr5_xeWPavI-lgN7Z0Goes2So&s=ZM64-JCd0WNFQgFlB3wbCfn8Jtf9rq5QyCO2-GJsC2Q&e=
> > 
> > Enjoy!
> > Julia
> > 
> > Changes from v4.1.46-rt51:
> > ---
> > Julia Cartwright (2):
> >workqueue: fixup rcu check for RT
> >Linux 4.1.46-rt52
> > 
> > Sebastian Andrzej Siewior (2):
> >PM / CPU: replace raw_notifier with atomic_notifier (fixup)
> 
> This change breaks the compile of 4.1 on ARM (and other things using
> kernel/cpu_pm.c).
> 
> rcu_irq_enter_irqson() and friends doesn't exist in 4.1.  In fact,
> rcu_irq_enter() doesn't even exist.
> 
> Sorry I didn't get to testing this earlier, I've been focused on other things.

Thanks!  My current builds apparently don't currently cover CPU_PM
builds, so I didn't see this :(.  Thanks for the report.

v4.1 does contain rcu_irq_enter(), and it looks like what we should be
using in 4.1 instead.

I'm thinking the below should be okay; I'll be standing up a proper cpu
hotplugging test to give it some runtime.

Thanks,
   Julia

-- 8< --
Subject: [PATCH] PM / CPU: replace usage of rcu_irq_{enter,exit}_irqson()

Commit c50243950f97e ("PM / CPU: replace raw_notifier with
atomic_notifier (fixup)") intended to fix up RCU usage in idle, by
telling RCU to pay attention.

However, the cherry-pick back from rt-devel included the invocation of
the _irqson() variants of rcu_irq_{enter,exit}() API which doesn't exist
in 4.1.

For 4.1, just simply drop the _irqson() variant, and instead use
rcu_irq_{enter,exit}() directly instead.  This is safe, because in 4.1,
these functions internally disable interrupts.

Reported-by: Corey Minyard 
Signed-off-by: Julia Cartwright 

Re: PI futexes + lock stealing woes

2017-12-07 Thread Julia Cartwright
On Thu, Dec 07, 2017 at 08:57:59AM -0600, Gratian Crisan wrote:
> 
> Peter Zijlstra writes:
> > The below compiles and boots, but is otherwise untested. Could you give
> > it a spin?
> 
> Thank you! Yes, I'll start a test now.

I gave it a spin with my minimal reproducing case w/ the strategic
msleep().  Previously, I could reproduce it in less than a minute in
that setup; now it's been running for a couple hours.  So ... looks
good? :)

Thanks,
   Julia


[PATCH] trace-cmd: add plugin for decoding syscalls/sys_enter_futex

2017-12-06 Thread Julia Cartwright
The futex syscall is a complicated one.  It supports thirteen
multiplexed operations, each with different semantics and encodings for
the syscalls six arguments.

Manually decoding these arguments is tedious and error prone.

This plugin provides symbolic names for futex operations, futex flags,
and tries to be intelligent about the intent of specific arguments (for
example, waking operations use 'val' as an integer count, not just an
arbitrary value).

It doesn't do a full decode of the FUTEX_WAKE_OP's 'val3' argument,
however, this is a good starting point.

Signed-off-by: Julia Cartwright 
---
 Makefile   |   1 +
 plugin_futex.c | 135 +
 2 files changed, 136 insertions(+)
 create mode 100644 plugin_futex.c

diff --git a/Makefile b/Makefile
index 5c35143c8867..58b957d9a578 100644
--- a/Makefile
+++ b/Makefile
@@ -365,6 +365,7 @@ PLUGIN_OBJS += plugin_kvm.o
 PLUGIN_OBJS += plugin_mac80211.o
 PLUGIN_OBJS += plugin_sched_switch.o
 PLUGIN_OBJS += plugin_function.o
+PLUGIN_OBJS += plugin_futex.o
 PLUGIN_OBJS += plugin_xen.o
 PLUGIN_OBJS += plugin_scsi.o
 PLUGIN_OBJS += plugin_cfg80211.o
diff --git a/plugin_futex.c b/plugin_futex.c
new file mode 100644
index ..b12c92d91a86
--- /dev/null
+++ b/plugin_futex.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright (C) 2017 National Instruments Corp.
+ *
+ * Author: Julia Cartwright 
+ *
+ * ~~
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License (not later!)
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not,  see <http://www.gnu.org/licenses>
+ *
+ * ~~
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include "trace-cmd.h"
+
+struct futex_args {
+   unsigned long long  uaddr;
+   unsigned long long  op;
+   unsigned long long  val;
+   unsigned long long  utime; /* or val2 */
+   unsigned long long  uaddr2;
+   unsigned long long  val3;
+};
+
+struct futex_op {
+   const char  *name;
+   const char  *fmt_val;
+   const char  *fmt_utime;
+   const char  *fmt_uaddr2;
+   const char  *fmt_val3;
+};
+
+static const struct futex_op futex_op_tbl[] = {
+   {"FUTEX_WAIT", " val=0x%08llx", " utime=0x%08llx",  
 NULL, NULL },
+   {"FUTEX_WAKE", " val=%llu",  NULL,  
 NULL, NULL },
+   {  "FUTEX_FD", " val=%llu",  NULL,  
 NULL, NULL },
+   { "FUTEX_REQUEUE", " val=%llu",  " val2=%llu", " 
uaddr2=0x%08llx", NULL },
+   { "FUTEX_CMP_REQUEUE", " val=%llu",  " val2=%llu", " 
uaddr2=0x%08llx", " val3=0x%08llx" },
+   { "FUTEX_WAKE_OP", " val=%llu",  " val2=%llu", " 
uaddr2=0x%08llx", " val3=0x%08llx" },
+   { "FUTEX_LOCK_PI",NULL, " utime=0x%08llx",  
 NULL, NULL },
+   {   "FUTEX_UNLOCK_PI",NULL,  NULL,  
 NULL, NULL },
+   {  "FUTEX_TRYLOCK_PI",NULL,  NULL,  
 NULL, NULL },
+   { "FUTEX_WAIT_BITSET", " val=0x%08llx", " utime=0x%08llx",  
 NULL, " val3=0x%08llx" },
+   { "FUTEX_WAKE_BITSET", " val=%llu",  NULL,  
 NULL, " val3=0x%08llx" },
+   { "FUTEX_WAIT_REQUEUE_PI", " val=0x%08llx", " utime=0x%08llx", " 
uaddr2=0x%08llx", " val3=0x%08llx" },
+   {  "FUTEX_CMP_REQUEUE_PI", " val=%llu",  " val2=%llu", " 
uaddr2=0x%08llx", " val3=0x%08llx" },
+};
+
+
+static void futex_print(struct trace_seq *s, const struct futex_args *args,
+   const struct futex_op *fop)
+{
+   trace_seq_printf(s, " uaddr=0x%08llx", args->uaddr);
+
+   if (fop->fmt_val)
+  

[PATCH v2 2/3] net: macb: reduce scope of rx_fs_lock-protected regions

2017-12-05 Thread Julia Cartwright
Commit ae8223de3df5 ("net: macb: Added support for RX filtering")
introduces a lock, rx_fs_lock which is intended to protect the list of
rx_flow items and synchronize access to the hardware rx filtering
registers.

However, the region protected by this lock is overscoped, unnecessarily
including things like slab allocation.  Reduce this lock scope to only
include operations which must be performed atomically: list traversal,
addition, and removal, and hitting the macb filtering registers.

This fixes the use of kmalloc w/ GFP_KERNEL in atomic context.

Fixes: ae8223de3df5 ("net: macb: Added support for RX filtering")
Cc: Rafal Ozieblo 
Cc: Julia Lawall 
Acked-by: Nicolas Ferre 
Signed-off-by: Julia Cartwright 
---
 drivers/net/ethernet/cadence/macb_main.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index b7644836aba1..758e8b3042b2 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2796,6 +2796,7 @@ static int gem_add_flow_filter(struct net_device *netdev,
struct macb *bp = netdev_priv(netdev);
struct ethtool_rx_flow_spec *fs = &cmd->fs;
struct ethtool_rx_fs_item *item, *newfs;
+   unsigned long flags;
int ret = -EINVAL;
bool added = false;
 
@@ -2811,6 +2812,8 @@ static int gem_add_flow_filter(struct net_device *netdev,
htonl(fs->h_u.tcp_ip4_spec.ip4dst),
htons(fs->h_u.tcp_ip4_spec.psrc), 
htons(fs->h_u.tcp_ip4_spec.pdst));
 
+   spin_lock_irqsave(&bp->rx_fs_lock, flags);
+
/* find correct place to add in list */
list_for_each_entry(item, &bp->rx_fs_list.list, list) {
if (item->fs.location > newfs->fs.location) {
@@ -2833,9 +2836,11 @@ static int gem_add_flow_filter(struct net_device *netdev,
if (netdev->features & NETIF_F_NTUPLE)
gem_enable_flow_filters(bp, 1);
 
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return 0;
 
 err:
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
kfree(newfs);
return ret;
 }
@@ -2846,6 +2851,9 @@ static int gem_del_flow_filter(struct net_device *netdev,
struct macb *bp = netdev_priv(netdev);
struct ethtool_rx_fs_item *item;
struct ethtool_rx_flow_spec *fs;
+   unsigned long flags;
+
+   spin_lock_irqsave(&bp->rx_fs_lock, flags);
 
list_for_each_entry(item, &bp->rx_fs_list.list, list) {
if (item->fs.location == cmd->fs.location) {
@@ -2862,12 +2870,14 @@ static int gem_del_flow_filter(struct net_device 
*netdev,
gem_writel_n(bp, SCRT2, fs->location, 0);
 
list_del(&item->list);
-   kfree(item);
bp->rx_fs_list.count--;
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
+   kfree(item);
return 0;
}
}
 
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return -EINVAL;
 }
 
@@ -2936,11 +2946,8 @@ static int gem_get_rxnfc(struct net_device *netdev, 
struct ethtool_rxnfc *cmd,
 static int gem_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
 {
struct macb *bp = netdev_priv(netdev);
-   unsigned long flags;
int ret;
 
-   spin_lock_irqsave(&bp->rx_fs_lock, flags);
-
switch (cmd->cmd) {
case ETHTOOL_SRXCLSRLINS:
if ((cmd->fs.location >= bp->max_tuples)
@@ -2959,7 +2966,6 @@ static int gem_set_rxnfc(struct net_device *netdev, 
struct ethtool_rxnfc *cmd)
ret = -EOPNOTSUPP;
}
 
-   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return ret;
 }
 
-- 
2.14.2



[PATCH v2 0/3] macb rx filter cleanups

2017-12-05 Thread Julia Cartwright
Here's a proper patchset based on net-next.

v1 -> v2:
  - Rebased on net-next
  - Add Nicolas's Acks
  - Reorder commits, putting the list_empty() cleanups prior to the
others.
  - Added commit reverting the GFP_ATOMIC change.

Julia Cartwright (3):
  net: macb: kill useless use of list_empty()
  net: macb: reduce scope of rx_fs_lock-protected regions
  net: macb: change GFP_ATOMIC to GFP_KERNEL

 drivers/net/ethernet/cadence/macb_main.c | 47 
 1 file changed, 23 insertions(+), 24 deletions(-)

-- 
2.14.2



[PATCH v2 1/3] net: macb: kill useless use of list_empty()

2017-12-05 Thread Julia Cartwright
The list_for_each_entry() macro already handles the case where the list
is empty (by not executing the loop body).  It's not necessary to handle
this case specially, so stop doing so.

Cc: Rafal Ozieblo 
Acked-by: Nicolas Ferre 
Signed-off-by: Julia Cartwright 
---
 drivers/net/ethernet/cadence/macb_main.c | 31 ---
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index ebfeab853bf4..b7644836aba1 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2812,24 +2812,20 @@ static int gem_add_flow_filter(struct net_device 
*netdev,
htons(fs->h_u.tcp_ip4_spec.psrc), 
htons(fs->h_u.tcp_ip4_spec.pdst));
 
/* find correct place to add in list */
-   if (list_empty(&bp->rx_fs_list.list))
-   list_add(&newfs->list, &bp->rx_fs_list.list);
-   else {
-   list_for_each_entry(item, &bp->rx_fs_list.list, list) {
-   if (item->fs.location > newfs->fs.location) {
-   list_add_tail(&newfs->list, &item->list);
-   added = true;
-   break;
-   } else if (item->fs.location == fs->location) {
-   netdev_err(netdev, "Rule not added: location %d 
not free!\n",
-   fs->location);
-   ret = -EBUSY;
-   goto err;
-   }
+   list_for_each_entry(item, &bp->rx_fs_list.list, list) {
+   if (item->fs.location > newfs->fs.location) {
+   list_add_tail(&newfs->list, &item->list);
+   added = true;
+   break;
+   } else if (item->fs.location == fs->location) {
+   netdev_err(netdev, "Rule not added: location %d not 
free!\n",
+   fs->location);
+   ret = -EBUSY;
+   goto err;
}
-   if (!added)
-   list_add_tail(&newfs->list, &bp->rx_fs_list.list);
}
+   if (!added)
+   list_add_tail(&newfs->list, &bp->rx_fs_list.list);
 
gem_prog_cmp_regs(bp, fs);
bp->rx_fs_list.count++;
@@ -2851,9 +2847,6 @@ static int gem_del_flow_filter(struct net_device *netdev,
struct ethtool_rx_fs_item *item;
struct ethtool_rx_flow_spec *fs;
 
-   if (list_empty(&bp->rx_fs_list.list))
-   return -EINVAL;
-
list_for_each_entry(item, &bp->rx_fs_list.list, list) {
if (item->fs.location == cmd->fs.location) {
/* disable screener regs for the flow entry */
-- 
2.14.2



[PATCH v2 3/3] net: macb: change GFP_ATOMIC to GFP_KERNEL

2017-12-05 Thread Julia Cartwright
Now that the rx_fs_lock is no longer held across allocation, it's safe
to use GFP_KERNEL for allocating new entries.

This reverts commit 81da3bf6e3f88 ("net: macb: change GFP_KERNEL to
GFP_ATOMIC").

Cc: Julia Lawall 
Signed-off-by: Julia Cartwright 
---
 drivers/net/ethernet/cadence/macb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 758e8b3042b2..234667eaaa92 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2800,7 +2800,7 @@ static int gem_add_flow_filter(struct net_device *netdev,
int ret = -EINVAL;
bool added = false;
 
-   newfs = kmalloc(sizeof(*newfs), GFP_ATOMIC);
+   newfs = kmalloc(sizeof(*newfs), GFP_KERNEL);
if (newfs == NULL)
return -ENOMEM;
memcpy(&newfs->fs, fs, sizeof(newfs->fs));
-- 
2.14.2



[PATCH 2/2] net: macb: kill useless use of list_empty()

2017-12-05 Thread Julia Cartwright
The list_for_each_entry() macro already handles the case where the list
is empty (by not executing the loop body).  It's not necessary to handle
this case specially, so stop doing so.

Cc: Rafal Ozieblo 
Signed-off-by: Julia Cartwright 
---
This is an additional cleanup patch found when looking at this code.

   Julia

 drivers/net/ethernet/cadence/macb_main.c | 34 
 1 file changed, 13 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index e7ef104a077d..3643c6ad2322 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2815,25 +2815,22 @@ static int gem_add_flow_filter(struct net_device 
*netdev,
spin_lock_irqsave(&bp->rx_fs_lock, flags);
 
/* find correct place to add in list */
-   if (list_empty(&bp->rx_fs_list.list))
-   list_add(&newfs->list, &bp->rx_fs_list.list);
-   else {
-   list_for_each_entry(item, &bp->rx_fs_list.list, list) {
-   if (item->fs.location > newfs->fs.location) {
-   list_add_tail(&newfs->list, &item->list);
-   added = true;
-   break;
-   } else if (item->fs.location == fs->location) {
-   netdev_err(netdev, "Rule not added: location %d 
not free!\n",
-   fs->location);
-   ret = -EBUSY;
-   goto err;
-   }
+   list_for_each_entry(item, &bp->rx_fs_list.list, list) {
+   if (item->fs.location > newfs->fs.location) {
+   list_add_tail(&newfs->list, &item->list);
+   added = true;
+   break;
+   } else if (item->fs.location == fs->location) {
+   netdev_err(netdev, "Rule not added: location %d not 
free!\n",
+   fs->location);
+   ret = -EBUSY;
+   goto err;
}
-   if (!added)
-   list_add_tail(&newfs->list, &bp->rx_fs_list.list);
}
 
+   if (!added)
+   list_add_tail(&newfs->list, &bp->rx_fs_list.list);
+
gem_prog_cmp_regs(bp, fs);
bp->rx_fs_list.count++;
/* enable filtering if NTUPLE on */
@@ -2859,11 +2856,6 @@ static int gem_del_flow_filter(struct net_device *netdev,
 
spin_lock_irqsave(&bp->rx_fs_lock, flags);
 
-   if (list_empty(&bp->rx_fs_list.list)) {
-   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
-   return -EINVAL;
-   }
-
list_for_each_entry(item, &bp->rx_fs_list.list, list) {
if (item->fs.location == cmd->fs.location) {
/* disable screener regs for the flow entry */
-- 
2.14.2


[PATCH 1/2] net: macb: reduce scope of rx_fs_lock-protected regions

2017-12-05 Thread Julia Cartwright
Commit ae8223de3df5 ("net: macb: Added support for RX filtering")
introduces a lock, rx_fs_lock which is intended to protect the list of
rx_flow items and synchronize access to the hardware rx filtering
registers.

However, the region protected by this lock is overscoped, unnecessarily
including things like slab allocation.  Reduce this lock scope to only
include operations which must be performed atomically: list traversal,
addition, and removal, and hitting the macb filtering registers.

This fixes the use of kmalloc w/ GFP_KERNEL in atomic context.

Fixes: ae8223de3df5 ("net: macb: Added support for RX filtering")
Cc: Rafal Ozieblo 
Cc: Julia Lawall 
Signed-off-by: Julia Cartwright 
---
While Julia Lawall's cocci-generated patch fixes the problem, the right
solution is to obviate the problem altogether.

Thanks,
   The Other Julia

 drivers/net/ethernet/cadence/macb_main.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index c5fa87cdc6c4..e7ef104a077d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2796,6 +2796,7 @@ static int gem_add_flow_filter(struct net_device *netdev,
struct macb *bp = netdev_priv(netdev);
struct ethtool_rx_flow_spec *fs = &cmd->fs;
struct ethtool_rx_fs_item *item, *newfs;
+   unsigned long flags;
int ret = -EINVAL;
bool added = false;
 
@@ -2811,6 +2812,8 @@ static int gem_add_flow_filter(struct net_device *netdev,
htonl(fs->h_u.tcp_ip4_spec.ip4dst),
htons(fs->h_u.tcp_ip4_spec.psrc), 
htons(fs->h_u.tcp_ip4_spec.pdst));
 
+   spin_lock_irqsave(&bp->rx_fs_lock, flags);
+
/* find correct place to add in list */
if (list_empty(&bp->rx_fs_list.list))
list_add(&newfs->list, &bp->rx_fs_list.list);
@@ -2837,9 +2840,11 @@ static int gem_add_flow_filter(struct net_device *netdev,
if (netdev->features & NETIF_F_NTUPLE)
gem_enable_flow_filters(bp, 1);
 
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return 0;
 
 err:
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
kfree(newfs);
return ret;
 }
@@ -2850,9 +2855,14 @@ static int gem_del_flow_filter(struct net_device *netdev,
struct macb *bp = netdev_priv(netdev);
struct ethtool_rx_fs_item *item;
struct ethtool_rx_flow_spec *fs;
+   unsigned long flags;
 
-   if (list_empty(&bp->rx_fs_list.list))
+   spin_lock_irqsave(&bp->rx_fs_lock, flags);
+
+   if (list_empty(&bp->rx_fs_list.list)) {
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return -EINVAL;
+   }
 
list_for_each_entry(item, &bp->rx_fs_list.list, list) {
if (item->fs.location == cmd->fs.location) {
@@ -2869,12 +2879,14 @@ static int gem_del_flow_filter(struct net_device 
*netdev,
gem_writel_n(bp, SCRT2, fs->location, 0);
 
list_del(&item->list);
-   kfree(item);
bp->rx_fs_list.count--;
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
+   kfree(item);
return 0;
}
}
 
+   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return -EINVAL;
 }
 
@@ -2943,11 +2955,8 @@ static int gem_get_rxnfc(struct net_device *netdev, 
struct ethtool_rxnfc *cmd,
 static int gem_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
 {
struct macb *bp = netdev_priv(netdev);
-   unsigned long flags;
int ret;
 
-   spin_lock_irqsave(&bp->rx_fs_lock, flags);
-
switch (cmd->cmd) {
case ETHTOOL_SRXCLSRLINS:
if ((cmd->fs.location >= bp->max_tuples)
@@ -2966,7 +2975,6 @@ static int gem_set_rxnfc(struct net_device *netdev, 
struct ethtool_rxnfc *cmd)
ret = -EOPNOTSUPP;
}
 
-   spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return ret;
 }
 
-- 
2.14.2


Re: PI futexes + lock stealing woes

2017-12-01 Thread Julia Cartwright
On Fri, Dec 01, 2017 at 12:11:15PM -0800, Darren Hart wrote:
> On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
> > Hey Thomas, Peter-
> > 
> > Gratian and I have been debugging into a nasty and difficult race w/
> > futexes seemingly the culprit.  The original symptom we were seeing
> > was a seemingly spurious -EDEADLK from a futex(LOCK_PI) operation.
> > 
> > On further analysis, however, it appears the thread which gets the
> > spurious -EDEADLK has observed a weird futex state: a prior
> > futex(WAIT_REQUEUE_PI) operation has returned -ETIMEDOUT, but the uaddr2
> > futex word owner field indicates that it's the owner.
> > 
> 
> Do you have a reproducer you can share?

We have a massive application which seems to reproduce it in 8 hours or
so, but it's not in a state to be shared :(.  So far, every attempt at
creating a simple, smaller reproducing case has failed.  We're still
trying, though :(.

One debugging technique we're trying to employ as well now that we think
we have a handle on the race is to pry the race window open with some
strategically placed spinning (or fixed-period sleeping).  Hopefully
that will make it easier to reproduce ...

> > Here's an attempt to boil down this situation into a pseudo trace; I'm
> > happy to forward along the full traces as well, if that would be
> > helpful:
> 
> Please do forward the full trace

Will do.  Chances are they are large enough to bounce from LKML, but
I'll send them along privately.

> > 
> >waiter  waker
> > stealer (prio > waiter)
> > 
> >futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
> >  timeout=[N ms])
> >   futex_wait_requeue_pi()
> >  futex_wait_queue_me()
> > freezable_schedule()
> > 
> >futex(LOCK_PI, uaddr2)
> >futex(CMP_REQUEUE_PI, uaddr,
> >  uaddr2, 1, 0)
> >   /* requeues waiter to uaddr2 
> > */
> >futex(UNLOCK_PI, uaddr2)
> >  wake_futex_pi()
> > 
> > cmp_futex_value_locked(uaddr, waiter)

minor fix: the above should have been:  
cmp_futex_value_locked(uaddr2, waiter)

> > wake_up_q()
> >
> > > clears sleeper->task>
> > 
> >futex(LOCK_PI, uaddr2)
> > 
> >   __rt_mutex_start_proxy_lock()
> > 
> >  try_to_take_rt_mutex() /* steals lock */
> > 
> > rt_mutex_set_owner(lock, stealer)
> > 
> >   
> >  
> >  rt_mutex_wait_proxy_lock()
> > __rt_mutex_slowlock()
> >try_to_take_rt_mutex() /* fails, lock held by stealer */
> >if (timeout && !timeout->task)
> >   return -ETIMEDOUT;
> > fixup_owner()
> >/* lock wasn't acquired, so,
> >   fixup_pi_state_owner skipped */
> >return -ETIMEDOUT;
> >
> >/* At this point, we've returned -ETIMEDOUT to userspace, but the
> > * futex word shows waiter to be the owner, and the pi_mutex has
> > * stealer as the owner */
>
> eee

Indeed. :(

> >futex_lock(LOCK_PI, uaddr2)
> >  -> bails with EDEADLK, futex word says we're owner.
> > 
> > At some later point in execution, the stealer gets scheduled back in and
> > will do fixup_owner() which fixes up the futex word, but at that point
> > it's too late: the waiter has already observed the wonky state.
> > 
> > fixup_owner() used to have additional seemingly relevant checks in place
> > that were removed 73d786bd043eb ("futex: Rework inconsistent
> > rt_mutex/futex_q state").
> 
> This and the subsequent changes moving some of this out from under the 
> hb->

[ANNOUNCE] 4.1.46-rt52

2017-11-29 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.1.46-rt52 stable release.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.1-rt
  Head SHA1: 6e737a91c1ce923d4e10db831e14cfef9a2459cc

Or to build 4.1.46-rt52 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.1.46.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.46-rt52.patch.xz


You can also build from 4.1.46-rt51 by applying the incremental patch:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/incr/patch-4.1.46-rt51-rt52.patch.xz

Enjoy!
   Julia

Changes from v4.1.46-rt51:
---
Julia Cartwright (2):
  workqueue: fixup rcu check for RT
  Linux 4.1.46-rt52

Sebastian Andrzej Siewior (2):
  PM / CPU: replace raw_notifier with atomic_notifier (fixup)
  kernel/hrtimer: migrate deferred timer on CPU down

---
 kernel/cpu_pm.c   | 7 +++
 kernel/time/hrtimer.c | 5 +
 kernel/workqueue.c| 2 +-
 localversion-rt   | 2 +-
 4 files changed, 14 insertions(+), 2 deletions(-)
---
diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index 9da42f83ee03..d1d1c3961553 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -28,8 +28,15 @@ static int cpu_pm_notify(enum cpu_pm_event event, int 
nr_to_call, int *nr_calls)
 {
int ret;
 
+   /*
+* __atomic_notifier_call_chain has a RCU read critical section, which
+* could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let
+* RCU know this.
+*/
+   rcu_irq_enter_irqson();
ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
nr_to_call, nr_calls);
+   rcu_irq_exit_irqson();
 
return notifier_to_errno(ret);
 }
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 2c6be169bdc7..75c990b00525 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1951,6 +1951,11 @@ static void migrate_hrtimer_list(struct 
hrtimer_clock_base *old_base,
/* Clear the migration state bit */
timer->state &= ~HRTIMER_STATE_MIGRATE;
}
+#ifdef CONFIG_PREEMPT_RT_BASE
+   list_splice_tail(&old_base->expired, &new_base->expired);
+   if (!list_empty(&new_base->expired))
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+#endif
 }
 
 static void migrate_hrtimers(int scpu)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6bdcab98501c..90e261c8811e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -363,7 +363,7 @@ static void workqueue_sysfs_unregister(struct 
workqueue_struct *wq);
   "RCU or wq->mutex should be held")
 
 #define assert_rcu_or_wq_mutex_or_pool_mutex(wq)   \
-   rcu_lockdep_assert(rcu_read_lock_sched_held() ||\
+   rcu_lockdep_assert(rcu_read_lock_held() ||  \
   lockdep_is_held(&wq->mutex) ||   \
   lockdep_is_held(&wq_pool_mutex), \
   "sched RCU, wq->mutex or wq_pool_mutex should be 
held")
diff --git a/localversion-rt b/localversion-rt
index 75493460c41f..66a5ed8bf3d7 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt51
+-rt52


PI futexes + lock stealing woes

2017-11-29 Thread Julia Cartwright
Hey Thomas, Peter-

Gratian and I have been debugging into a nasty and difficult race w/
futexes seemingly the culprit.  The original symptom we were seeing
was a seemingly spurious -EDEADLK from a futex(LOCK_PI) operation.

On further analysis, however, it appears the thread which gets the
spurious -EDEADLK has observed a weird futex state: a prior
futex(WAIT_REQUEUE_PI) operation has returned -ETIMEDOUT, but the uaddr2
futex word owner field indicates that it's the owner.

Here's an attempt to boil down this situation into a pseudo trace; I'm
happy to forward along the full traces as well, if that would be
helpful:

   waiter  waker
stealer (prio > waiter)

   futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
 timeout=[N ms])
  futex_wait_requeue_pi()
 futex_wait_queue_me()
freezable_schedule()

   futex(LOCK_PI, uaddr2)
   futex(CMP_REQUEUE_PI, uaddr,
 uaddr2, 1, 0)
  /* requeues waiter to uaddr2 */
   futex(UNLOCK_PI, uaddr2)
 wake_futex_pi()

cmp_futex_value_locked(uaddr, waiter)
wake_up_q()
   
   task>

   futex(LOCK_PI, uaddr2)

  __rt_mutex_start_proxy_lock()

 try_to_take_rt_mutex() /* steals lock */

rt_mutex_set_owner(lock, stealer)

  
 
 rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
   try_to_take_rt_mutex() /* fails, lock held by stealer */
   if (timeout && !timeout->task)
  return -ETIMEDOUT;
fixup_owner()
   /* lock wasn't acquired, so,
  fixup_pi_state_owner skipped */
   return -ETIMEDOUT;

   /* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */

   futex_lock(LOCK_PI, uaddr2)
 -> bails with EDEADLK, futex word says we're owner.

At some later point in execution, the stealer gets scheduled back in and
will do fixup_owner() which fixes up the futex word, but at that point
it's too late: the waiter has already observed the wonky state.

fixup_owner() used to have additional seemingly relevant checks in place
that were removed 73d786bd043eb ("futex: Rework inconsistent
rt_mutex/futex_q state").

The actual kernel we've been testing is 4.9.33-rt23, w/ 153fbd1226fb3
("futex: Fix more put_pi_state() vs. exit_pi_state_list() races")
cherry-picked w/ PREEMPT_RT_FULL.  However, it appears that this issue
may affect v4.15-rc1?

Thoughts on how to move forward?

Nasty.

   Julia


[PATCH RT 4/4] Linux 4.1.46-rt52-rc1

2017-11-10 Thread Julia Cartwright
4.1.46-rt52-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 75493460c41f..d42746076d9b 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt51
+-rt52-rc1
-- 
2.14.2



[PATCH RT 3/4] kernel/hrtimer: migrate deferred timer on CPU down

2017-11-10 Thread Julia Cartwright
4.1.46-rt52-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
From: Sebastian Andrzej Siewior 

hrtimers, which were deferred to the softirq context, and expire between
softirq shutdown and hrtimer migration are dangling around. If the CPU
goes back up the list head will be initialized and this corrupts the
timer's list. It will remain unnoticed until a hrtimer_cancel().
This moves those timers so they will expire.

Cc: stable...@vger.kernel.org
Reported-by: Mike Galbraith 
Tested-by: Mike Galbraith 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit b3c08bffdcdd23f1b3ca8d9c01e3b8a715e03d46)
Signed-off-by: Julia Cartwright 
---
 kernel/time/hrtimer.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 2c6be169bdc7..75c990b00525 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1951,6 +1951,11 @@ static void migrate_hrtimer_list(struct 
hrtimer_clock_base *old_base,
/* Clear the migration state bit */
timer->state &= ~HRTIMER_STATE_MIGRATE;
}
+#ifdef CONFIG_PREEMPT_RT_BASE
+   list_splice_tail(&old_base->expired, &new_base->expired);
+   if (!list_empty(&new_base->expired))
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+#endif
 }
 
 static void migrate_hrtimers(int scpu)
-- 
2.14.2



[PATCH RT 0/4] Linux 4.1.46-rt52-rc1

2017-11-10 Thread Julia Cartwright
Dear RT Folks,

This is the RT stable review cycle of patch 4.1.46-rt52-rc1.  Please review
the included patches, and test!

You might ask: "Where was 4.1.46-rt51?".  The answer is: it was released
silently, and only as a means to facilitate release-level bisection.
Version 4.1.46-rt51 has known issues to be resolved by the pending
release of -rt52.

The -rc release will be uploaded to kernel.org and will be deleted when the
final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 11/15/2017.

   Julia


To build 4.1.46-rt52-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.1.46.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.46-rt52-rc1.patch.xz

You can also build from 4.1.46-rt51 release by applying the incremental patch:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/incr/patch-4.1.46-rt51-rt52-rc1.patch.xz

Julia Cartwright (2):
  workqueue: fixup rcu check for RT
  Linux 4.1.46-rt52-rc1

Sebastian Andrzej Siewior (2):
  PM / CPU: replace raw_notifier with atomic_notifier (fixup)
  kernel/hrtimer: migrate deferred timer on CPU down

 kernel/cpu_pm.c   | 7 +++
 kernel/time/hrtimer.c | 5 +
 kernel/workqueue.c| 2 +-
 localversion-rt   | 2 +-
 4 files changed, 14 insertions(+), 2 deletions(-)

-- 
2.14.2



[PATCH RT 2/4] workqueue: fixup rcu check for RT

2017-11-10 Thread Julia Cartwright
4.1.46-rt52-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Upstream commit 5b95e1af8d17d ("workqueue: wq_pool_mutex protects the
attrs-installation") introduced an additional assertion
(assert_rcu_or_wq_mutex_or_pool_mutex) which contains a check ensuring
that the caller is in a RCU-sched read-side critical section.

However, on RT, the locking rules are lessened to only require require
_normal_ RCU.  Fix up this check.

The upstream commit was cherry-picked back into stable v4.1.19 as d3c4dd8843be.

This fixes up the bogus splat triggered on boot:

===
[ INFO: suspicious RCU usage. ]
4.1.42-rt50
---
kernel/workqueue.c:609 sched RCU, wq->mutex or wq_pool_mutex should be held!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by swapper/0/1:
 #0:  ((pendingb_lock).lock){+.+...}, at: queue_work_on+0x64/0x1c0
 #1:  (rcu_read_lock){..}, at:  __queue_work+0x2a/0x880

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.1.42-rt50 #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-20170228_101828-anatol 04/01/2014
Call Trace:
 dump_stack+0x70/0x9a
 lockdep_rcu_suspicious+0xe7/0x120
 unbound_pwq_by_node+0x92/0x100
 __queue_work+0x28c/0x880
 ? __queue_work+0x2a/0x880
 queue_work_on+0xc9/0x1c0
 call_usermodehelper_exec+0x1a7/0x200
 kobject_uevent_env+0x4be/0x520
 ? initcall_blacklist+0xa2/0xa2
 kobject_uevent+0xb/0x10
 kset_register+0x34/0x50
 bus_register+0x100/0x2d0
 ? ftrace_define_fields_workqueue_work+0x29/0x29
 subsys_virtual_register+0x26/0x50
 wq_sysfs_init+0x12/0x14
 do_one_initcall+0x88/0x1b0
 ? parse_args+0x190/0x410
 kernel_init_freeable+0x204/0x299
 ? rest_init+0x140/0x140
 kernel_init+0x9/0xf0
 ret_from_fork+0x42/0x70
 ? rest_init+0x140/0x140

Reported-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6bdcab98501c..90e261c8811e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -363,7 +363,7 @@ static void workqueue_sysfs_unregister(struct 
workqueue_struct *wq);
   "RCU or wq->mutex should be held")
 
 #define assert_rcu_or_wq_mutex_or_pool_mutex(wq)   \
-   rcu_lockdep_assert(rcu_read_lock_sched_held() ||\
+   rcu_lockdep_assert(rcu_read_lock_held() ||  \
   lockdep_is_held(&wq->mutex) ||   \
   lockdep_is_held(&wq_pool_mutex), \
   "sched RCU, wq->mutex or wq_pool_mutex should be 
held")
-- 
2.14.2



[PATCH RT 1/4] PM / CPU: replace raw_notifier with atomic_notifier

2017-11-10 Thread Julia Cartwright
4.1.46-rt52-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
From: Sebastian Andrzej Siewior 

The original patch changed betwen its posting and what finally went into
Rafael's tree so here is the delta.

Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit f648e23dac72deef07f25e05fc09dbbc209dbd33)
Signed-off-by: Julia Cartwright 
---
 kernel/cpu_pm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index 9da42f83ee03..d1d1c3961553 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -28,8 +28,15 @@ static int cpu_pm_notify(enum cpu_pm_event event, int 
nr_to_call, int *nr_calls)
 {
int ret;
 
+   /*
+* __atomic_notifier_call_chain has a RCU read critical section, which
+* could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let
+* RCU know this.
+*/
+   rcu_irq_enter_irqson();
ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
nr_to_call, nr_calls);
+   rcu_irq_exit_irqson();
 
return notifier_to_errno(ret);
 }
-- 
2.14.2



[PATCH] futex: Drop now unnecessary check in exit_pi_state()

2017-11-03 Thread Julia Cartwright
This check was an attempt to protect against a race with put_pi_state()
by ensuring that the pi_state_list was consistent across the
unlock/lock of pi_lock.

However, as of commit 153fbd1226fb3 ("futex: Fix more put_pi_state() vs.
exit_pi_state_list() races"), this check is no longer necessary because
we now hold a reference to the pi_state object across the unlock/lock of
pi_lock.  This reference guarantees that a put_pi_state() on another CPU
won't rip the pi_state object from the list when we drop pi_lock.

Cc: Gratian Crisan 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Darren Hart 
Signed-off-by: Julia Cartwright 
---
I'm not sure my analysis is 100% correct here, so please carefully think
through it, as I'm sure you all always do when futex patches hit your
mailbox :).

   Julia

 kernel/futex.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index ca5bb9cba5cf..e127ec0555b6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -889,7 +889,7 @@ static struct task_struct *futex_find_get_task(pid_t pid)
  */
 void exit_pi_state_list(struct task_struct *curr)
 {
-   struct list_head *next, *head = &curr->pi_state_list;
+   struct list_head *head = &curr->pi_state_list;
struct futex_pi_state *pi_state;
struct futex_hash_bucket *hb;
union futex_key key = FUTEX_KEY_INIT;
@@ -903,8 +903,7 @@ void exit_pi_state_list(struct task_struct *curr)
 */
raw_spin_lock_irq(&curr->pi_lock);
while (!list_empty(head)) {
-   next = head->next;
-   pi_state = list_entry(next, struct futex_pi_state, list);
+   pi_state = list_first_entry(head, struct futex_pi_state, list);
key = pi_state->key;
hb = hash_futex(&key);
 
@@ -929,17 +928,6 @@ void exit_pi_state_list(struct task_struct *curr)
spin_lock(&hb->lock);
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
raw_spin_lock(&curr->pi_lock);
-   /*
-* We dropped the pi-lock, so re-check whether this
-* task still owns the PI-state:
-*/
-   if (head->next != next) {
-   /* retain curr->pi_lock for the loop invariant */
-   raw_spin_unlock(&pi_state->pi_mutex.wait_lock);
-   spin_unlock(&hb->lock);
-   put_pi_state(pi_state);
-   continue;
-   }
 
WARN_ON(pi_state->owner != curr);
WARN_ON(list_empty(&pi_state->list));
-- 
2.14.2



Re: [PATCH 1/2] IB/hfi1: Use preempt_{dis,en}able_nort()

2017-10-05 Thread Julia Cartwright
On Thu, Oct 05, 2017 at 01:53:05PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Oct 05, 2017 at 09:17:44AM -0500, Julia Cartwright escreveu:
> > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > > +++ b/drivers/infiniband/hw/hfi1/pio.c
> > > @@ -1421,7 +1421,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context 
> > > *sc, u32 dw_len,
> 
> > >   /* there is enough room */
> 
> > > - preempt_disable();
> > > + preempt_disable_nort();
> > >   this_cpu_inc(*sc->buffers_allocated);
> 
> > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?
> 
> No
> 
> > I believe that the this_cpu_* operations perform a preemption check, which 
> > we'd trip.
>
> Humm, looking at include/linux/percpu-defs.h on v4.11.12-rt14 I see
> (trimmed to what we're discussing here):
>
> #ifdef CONFIG_DEBUG_PREEMPT
> extern void __this_cpu_preempt_check(const char *op);
> #else
> static inline void __this_cpu_preempt_check(const char *op) { }
> #endif
>
> #define __this_cpu_add(pcp, val)  \
> ({\
>   __this_cpu_preempt_check("add");\
>   raw_cpu_add(pcp, val);  \
> })
> #define __this_cpu_inc(pcp)   __this_cpu_add(pcp, 1)
>
> /*
>  * Operations with implied preemption/interrupt protection.  These
>  * operations can be used without worrying about preemption or interrupt.
>  */
> #define this_cpu_add(pcp, val)  __pcpu_size_call(this_cpu_add_, pcp, 
> val)
> #define this_cpu_inc(pcp)   this_cpu_add(pcp, 1)
>
> > You may also have to change these to the non-preempt checked variants.
>
> So __this_cpu_inc() checks preemption but this_cpu_inc() doesn't and
> thus we're ok here? Or am I getting lost in this maze of defines? :-)

I think I was the one lost in the maze.  You are correct.  Sorry for the
confusion.

My mind expected that the __ prefixed versions would be the "raw"
versions that are free of checks, with the checks made in the non
prefixed versions, but it is the other way around.

I'm happy to accept that the bug is within my mind.

   Julia


Re: [PATCH 1/2] IB/hfi1: Use preempt_{dis,en}able_nort()

2017-10-05 Thread Julia Cartwright
On Thu, Oct 05, 2017 at 06:16:23PM +0200, Thomas Gleixner wrote:
> On Thu, 5 Oct 2017, Julia Cartwright wrote:
> 
> > On Thu, Oct 05, 2017 at 11:55:39AM -0400, Steven Rostedt wrote:
> > > On Thu, 5 Oct 2017 10:37:59 -0500
> > > Julia Cartwright  wrote:
> > > 
> > > > On Thu, Oct 05, 2017 at 05:27:30PM +0200, Thomas Gleixner wrote:
> > > > > On Thu, 5 Oct 2017, Julia Cartwright wrote:  
> > > > > > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo 
> > > > > > wrote:  
> > > > > > > - preempt_disable();
> > > > > > > + preempt_disable_nort();
> > > > > > >   this_cpu_inc(*sc->buffers_allocated);  
> > > > > > 
> > > > > > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that 
> > > > > > the
> > > > > > this_cpu_* operations perform a preemption check, which we'd trip.  
> > > > > 
> > > > > Good point. Changing this to migrate_disable() would do the trick.  
> > > > 
> > > > Wouldn't we still trip the preempt check even with migration disabled?
> > > > In another thread I asked the same question: should the preemption
> > > > checks here be converted to migration-checks in RT?
> > > 
> > > Is it a "preemption check"?
> > 
> > Sorry if I was unclear, more precisely: the this_cpu_* family of
> > accessors, w/ CONFIG_DEBUG_PREEMPT currently spits out a warning when
> > the caller is invoked in a context where preemption is enabled.
> > 
> > The check is shared w/ the smp_processor_id() check, as implemented in
> > lib/smp_processor_id.c.  It effectively boils down to a check of
> > preempt_count() and irqs_disabled().
> 
> Except that on RT that check cares about the migrate disable state. You can
> invoke this_cpu_* and smp_processor_id() in preemptible/interruptible
> context because of:
> 
>   if (cpumask_equal(current->cpus_ptr, cpumask_of(this_cpu)))
>   goto out;
> 
> That's true even on mainline.
> 
> But you are right that this check needs some update because
> migrate_disable() does not immediately update the allowed cpumask IIRC.

Actually, I think it does:

   migrate_disable() ->
  p = current;
  ...
  migrate_disable_update_cpus_allowed(p) ->
  p->cpus_ptr = cpumask_of(smp_processor_id())
  ...

Perhaps it's worth a simple comment update, below.

   Julia

-- 8< --
Subject: [PATCH] kernel: sched: document smp_processor_id/this_cpu safety in
 migration-disabled context

On RT, users of smp_processor_id() and this_cpu_*() per-cpu accessors
are considered safe if the caller executes with migration disabled.  On
!RT, preempt_disable() is sufficient to make this guarantee, however on
RT, the lesser migrate_disable() is sufficient.

It is not entirely obvious which check in check_preemption_disabled()
makes this work, so document it.

Signed-off-by: Julia Cartwright 
---
 lib/smp_processor_id.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index de3b2d925473..c8091d9eb1b4 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -20,7 +20,11 @@ notrace static unsigned int check_preemption_disabled(const 
char *what1,
 
/*
 * Kernel threads bound to a single CPU can safely use
-* smp_processor_id():
+* smp_processor_id().
+*
+* In addition, threads which are currently executing within
+* a migration disabled region can safely use smp_processor_id() and
+* this_cpu accessors.
 */
if (cpumask_equal(current->cpus_ptr, cpumask_of(this_cpu)))
goto out;
-- 
2.14.1


Re: [PATCH 1/2] IB/hfi1: Use preempt_{dis,en}able_nort()

2017-10-05 Thread Julia Cartwright
On Thu, Oct 05, 2017 at 11:55:39AM -0400, Steven Rostedt wrote:
> On Thu, 5 Oct 2017 10:37:59 -0500
> Julia Cartwright  wrote:
> 
> > On Thu, Oct 05, 2017 at 05:27:30PM +0200, Thomas Gleixner wrote:
> > > On Thu, 5 Oct 2017, Julia Cartwright wrote:  
> > > > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo 
> > > > wrote:  
> > > > > - preempt_disable();
> > > > > + preempt_disable_nort();
> > > > >   this_cpu_inc(*sc->buffers_allocated);  
> > > > 
> > > > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
> > > > this_cpu_* operations perform a preemption check, which we'd trip.  
> > > 
> > > Good point. Changing this to migrate_disable() would do the trick.  
> > 
> > Wouldn't we still trip the preempt check even with migration disabled?
> > In another thread I asked the same question: should the preemption
> > checks here be converted to migration-checks in RT?
> 
> Is it a "preemption check"?

Sorry if I was unclear, more precisely: the this_cpu_* family of
accessors, w/ CONFIG_DEBUG_PREEMPT currently spits out a warning when
the caller is invoked in a context where preemption is enabled.

The check is shared w/ the smp_processor_id() check, as implemented in
lib/smp_processor_id.c.  It effectively boils down to a check of
preempt_count() and irqs_disabled().

> Getting a cpu # should only care about migration.

I think we're agreeing? :)

> This isn't the same as a rcu_sched check is it? That does care about
> preemption.

This is something totally different, I think.

   Julia


Re: [PATCH 1/2] IB/hfi1: Use preempt_{dis,en}able_nort()

2017-10-05 Thread Julia Cartwright
On Thu, Oct 05, 2017 at 05:27:30PM +0200, Thomas Gleixner wrote:
> On Thu, 5 Oct 2017, Julia Cartwright wrote:
> > On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > > - preempt_disable();
> > > + preempt_disable_nort();
> > >   this_cpu_inc(*sc->buffers_allocated);
> > 
> > Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
> > this_cpu_* operations perform a preemption check, which we'd trip.
> 
> Good point. Changing this to migrate_disable() would do the trick.

Wouldn't we still trip the preempt check even with migration disabled?
In another thread I asked the same question: should the preemption
checks here be converted to migration-checks in RT?

   Julia


Re: [PATCH 1/2] IB/hfi1: Use preempt_{dis,en}able_nort()

2017-10-05 Thread Julia Cartwright
On Tue, Oct 03, 2017 at 12:49:19PM -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo 
> 
> sc_buffer_alloc() disables preemption that will be reenabled by either
> pio_copy() or seg_pio_copy_end(). But before disabling preemption it
> grabs a spin lock that will be dropped after it disables preemption,
> which ends up triggering a warning in migrate_disable() later on.
> 
> spin_lock_irqsave(&sc->alloc_lock)
>   migrate_disable() ++p->migrate_disable -> 2
> preempt_disable()
> spin_unlock_irqrestore(&sc->alloc_lock)
>   migrate_enable() in_atomic(), so just returns, migrate_disable stays at 
> 2
> spin_lock_irqsave(some other lock) -> b00m
> 
> And the WARN_ON code ends up tripping over this over and over in
> log_store().
> 
> Sequence captured via ftrace_dump_on_oops + crash utility 'dmesg'
> command.
> 
> [512258.613862] sm-3297 16 .11 359465349134644: sc_buffer_alloc 
> <-hfi1_verbs_send_pio
> [512258.613876] sm-3297 16 .11 359465349134719: migrate_disable 
> <-sc_buffer_alloc
> [512258.613890] sm-3297 16 .12 359465349134798: rt_spin_lock 
> <-sc_buffer_alloc
> [512258.613903] sm-3297 16 112 359465349135481: rt_spin_unlock 
> <-sc_buffer_alloc
> [512258.613916] sm-3297 16 112 359465349135556: migrate_enable 
> <-sc_buffer_alloc
> [512258.613935] sm-3297 16 112 359465349135788: seg_pio_copy_start 
> <-hfi1_verbs_send_pio
> [512258.613954] sm-3297 16 112 359465349136273: update_sge 
> <-hfi1_verbs_send_pio
> [512258.613981] sm-3297 16 112 359465349136373: seg_pio_copy_mid 
> <-hfi1_verbs_send_pio
> [512258.613999] sm-3297 16 112 359465349136873: update_sge 
> <-hfi1_verbs_send_pio
> [512258.614017] sm-3297 16 112 359465349136956: seg_pio_copy_mid 
> <-hfi1_verbs_send_pio
> [512258.614035] sm-3297 16 112 359465349137221: seg_pio_copy_end 
> <-hfi1_verbs_send_pio
> [512258.614048] sm-3297 16 .12 359465349137360: migrate_disable 
> <-hfi1_verbs_send_pio
> [512258.614065] sm-3297 16 .12 359465349137476: warn_slowpath_null 
> <-migrate_disable
> [512258.614081] sm-3297 16 .12 359465349137564: __warn 
> <-warn_slowpath_null
> [512258.614088] sm-3297 16 .12 359465349137958: printk <-__warn
> [512258.614096] sm-3297 16 .12 359465349138055: vprintk_default <-printk
> [512258.614104] sm-3297 16 .12 359465349138144: vprintk_emit 
> <-vprintk_default
> [512258.614111] sm-3297 16 d12 359465349138312: _raw_spin_lock 
> <-vprintk_emit
> [512258.614119] sm-3297 16 d...112 359465349138789: log_store <-vprintk_emit
> [512258.614127] sm-3297 16 .12 359465349139068: migrate_disable 
> <-vprintk_emit
> 
> According to a discussion (see Link: below) on the linux-rt-users
> mailing list, this locking is done for performance reasons, not for
> correctness, so use the _nort() variants to avoid the above problem.
> 
> Suggested-by: Julia Cartwright 
> Cc: Clark Williams 
> Cc: Dean Luick 
> Cc: Dennis Dalessandro 
> Cc: Doug Ledford 
> Cc: Kaike Wan 
> Cc: Leon Romanovsky 
> Cc: linux-r...@vger.kernel.org
> Cc: Peter Zijlstra 
> Cc: Sebastian Andrzej Siewior 
> Cc: Sebastian Sanchez 
> Cc: Steven Rostedt 
> Cc: Thomas Gleixner 
> Link: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20170926210045.GO29872-40jcartwri.amer.corp.natinst.com&d=DwIBaQ&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=cAXq_8W9Othb2h8ZcWv8Vw&m=zzKtWJ595HB0jyuiFic0ZEkpmmjvGRXJHkGF27oyvCI&s=J4_Al0cbvQ9PCM3VbqzJ6apmpSZI9Xx7eq6Gcfucp24&e=
>  
> Signed-off-by: Arnaldo Carvalho de Melo 
> ---
>  drivers/infiniband/hw/hfi1/pio.c  | 2 +-
>  drivers/infiniband/hw/hfi1/pio_copy.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/pio.c 
> b/drivers/infiniband/hw/hfi1/pio.c
> index 615be68e40b3..3a30bde9a07b 100644
> --- a/drivers/infiniband/hw/hfi1/pio.c
> +++ b/drivers/infiniband/hw/hfi1/pio.c
> @@ -1421,7 +1421,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context 
> *sc, u32 dw_len,
>  
>   /* there is enough room */
>  
> - preempt_disable();
> + preempt_disable_nort();
>   this_cpu_inc(*sc->buffers_allocated);

Have you tried this on RT w/ CONFIG_DEBUG_PREEMPT?  I believe that the
this_cpu_* operations perform a preemption check, which we'd trip.

You may also have to change these to the non-preempt checked variants.

   Julia


Re: [PATCH v2 37/40] tracing: Add inter-event hist trigger Documentation

2017-09-20 Thread Julia Cartwright
On Tue, Sep 05, 2017 at 04:57:49PM -0500, Tom Zanussi wrote:
> Add background and details on inter-event hist triggers, including
> hist variables, synthetic events, and actions.
> 
> Signed-off-by: Tom Zanussi 
> Signed-off-by: Baohong Liu 
> ---
>  Documentation/trace/events.txt | 385 
> +
>  1 file changed, 385 insertions(+)
> 
> diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
> index f271d87..7ee720b 100644
> --- a/Documentation/trace/events.txt
> +++ b/Documentation/trace/events.txt
> @@ -571,6 +571,7 @@ The following commands are supported:
>   .sym-offset display an address as a symbol and offset
>   .syscalldisplay a syscall id as a system call name
>   .execname   display a common_pid as a program name
> + .usecs  display a $common_timestamp in microseconds
>  
>Note that in general the semantics of a given field aren't
>interpreted when applying a modifier to it, but there are some
> @@ -2101,3 +2102,387 @@ The following commands are supported:
>  Hits: 489
>  Entries: 7
>  Dropped: 0
> +
> +6.3 Inter-event hist triggers
> +-
> +
> +Inter-event hist triggers are hist triggers that combine values from
> +one or more other events and create a histogram using that data.  Data
> +from an inter-event histogram can in turn become the source for
> +further combined histograms, thus providing a chain of related
> +histograms, which is important for some applications.
> +
> +The most important example of an inter-event quantity that can be used
> +in this manner is latency, which is simply a difference in timestamps
> +between two events (although trace events don't have an externally
> +visible timestamp field, the inter-event hist trigger support adds a
> +pseudo-field to all events named '$common_timestamp' which can be used
> +as if it were an actual event field)

What's between the parentheses is covered below, is it worth mentioning
in both places?

[..]
> +  - Trace events don't have a 'timestamp' associated with them, but
> +there is an implicit timestamp saved along with an event in the
> +underlying ftrace ring buffer.  This timestamp is now exposed as a
> +a synthetic field named '$common_timestamp' which can be used in
> +histograms as if it were any other event field.  Note that it has
> +a '$' prefixed to it - this is meant to indicate that it isn't an
> +actual field in the trace format but rather is a synthesized value
> +that nonetheless can be used as if it were an actual field.  By
> +default it is in units of nanoseconds; appending '.usecs' to a
> +common_timestamp field changes the units to microseconds.
> +
> +These features are decribed in more detail in the following sections.

  ^ described

> +
> +6.3.1 Histogram Variables
> +-
> +
> +Variables are simply named locations used for saving and retrieving
> +values between matching events.  A 'matching' event is defined as an
> +event that has a matching key - if a variable is saved for a histogram
> +entry corresponding to that key, any subsequent event with a matching
> +key can access that variable.
> +
> +A variable's value is normally available to any subsequent event until
> +it is set to something else by a subsequent event.  The one exception
> +to that rule is that any variable used in an expression is essentially
> +'read-once' - once it's used by an expression in a subsequent event,
> +it's reset to its 'unset' state, which means it can't be used again
> +unless it's set again.  This ensures not only that an event doesn't
> +use an uninitialized variable in a calculation, but that that variable
> +is used only once and not for any unrelated subsequent match.
> +
> +The basic syntax for saving a variable is to simply prefix a unique
> +variable name not corresponding to any keyword along with an '=' sign
> +to any event field.

This would seem to make it much more difficult in the future to add new
keywords for hist triggers without breaking users existing triggers.
Maybe that's not a problem given the tracing ABI wild west.

[..]
> +6.3.2 Synthetic Events
> +--
> +
> +Synthetic events are user-defined events generated from hist trigger
> +variables or fields associated with one or more other events.  Their
> +purpose is to provide a mechanism for displaying data spanning
> +multiple events consistent with the existing and already familiar
> +usage for normal events.
> +
> +To define a synthetic event, the user writes a simple specification
> +consisting of the name of the new event along with one or more
> +variables and their types, which can be any valid field type,
> +separated by semicolons, to the tracing/synthetic_events file.
> +
> +For instance, the following creates a new event named 'wakeup_latency'
> +with 3 fields: lat, pid, and prio.  Each of those fields is sim

Re: [ANNOUNCE] 4.1.42-rt50

2017-08-17 Thread Julia Cartwright
On Thu, Aug 17, 2017 at 09:30:28AM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-08-16 15:42:28 [-0500], Julia Cartwright wrote:
[..]
> > diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
> > index 9656a3c36503..9da42f83ee03 100644
> > --- a/kernel/cpu_pm.c
> > +++ b/kernel/cpu_pm.c
> > @@ -22,14 +22,13 @@
> >  #include 
> >  #include 
> >  
> > -static DEFINE_RWLOCK(cpu_pm_notifier_lock);
> > -static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain);
> > +static ATOMIC_NOTIFIER_HEAD(cpu_pm_notifier_chain);
> >  
> >  static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int 
> > *nr_calls)
> >  {
> > int ret;
> >  
> > -   ret = __raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
> > +   ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
> > nr_to_call, nr_calls);
> 
> there is a piece missing, upstream has a different change queued. I know
> that this is the same as in latest RT, this is just to let you know in
> case someone complains about an RCU backtrace???

Ah!  Thanks. Indeed.  I did see the upstream-queued version get cc'd to
linux-rt-users, but didn't see the rcu_irq_{enter,exit}_irqson()
addition, assuming it to be what landed in rt-devel.

Will you be fixing this up in 4.11-rt?

   Julia


[ANNOUNCE] 4.1.42-rt50

2017-08-16 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.1.42-rt50 stable release.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.1-rt
  Head SHA1: 2e2586f49c8f6b84ceeecce704901405a7e780df

Or to build 4.1.42-rt50 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.1.42.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.42-rt50.patch.xz


You can also build from 4.1.42-rt49 by applying the incremental patch:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/incr/patch-4.1.42-rt49-rt50.patch.xz

Enjoy!
   Julia

Changes from v4.1.42-rt49:
---
Alex Shi (1):
  cpu_pm: replace raw_notifier to atomic_notifier

Julia Cartwright (1):
  Linux 4.1.42-rt50

Peter Zijlstra (2):
  lockdep: Fix per-cpu static objects
  sched: Remove TASK_ALL

Thomas Gleixner (2):
  rtmutex: Make lock_killable work
  sched: Prevent task state corruption by spurious lock wakeup

 include/linux/sched.h|  1 -
 include/linux/smp.h  | 12 
 init/main.c  |  8 
 kernel/cpu_pm.c  | 43 ++-
 kernel/locking/rtmutex.c | 19 +++
 kernel/module.c  |  6 +-
 kernel/sched/core.c  |  2 +-
 localversion-rt  |  2 +-
 mm/percpu.c  |  5 -
 9 files changed, 44 insertions(+), 54 deletions(-)
---
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d51525ce2c41..7587d6181cd2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -227,7 +227,6 @@ extern char ___assert_task_state[1 - 2*!!(
 
 /* Convenience macros for the sake of wake_up */
 #define TASK_NORMAL(TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)
-#define TASK_ALL   (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED)
 
 /* get_task_state() */
 #define TASK_REPORT(TASK_RUNNING | TASK_INTERRUPTIBLE | \
diff --git a/include/linux/smp.h b/include/linux/smp.h
index e6ab36aeaaab..cbf6836524dc 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -120,6 +120,13 @@ extern unsigned int setup_max_cpus;
 extern void __init setup_nr_cpu_ids(void);
 extern void __init smp_init(void);
 
+extern int __boot_cpu_id;
+
+static inline int get_boot_cpu_id(void)
+{
+   return __boot_cpu_id;
+}
+
 #else /* !SMP */
 
 static inline void smp_send_stop(void) { }
@@ -158,6 +165,11 @@ static inline void smp_init(void) { up_late_init(); }
 static inline void smp_init(void) { }
 #endif
 
+static inline int get_boot_cpu_id(void)
+{
+   return 0;
+}
+
 #endif /* !SMP */
 
 /*
diff --git a/init/main.c b/init/main.c
index 0486a8e11fc0..e1bae15a2154 100644
--- a/init/main.c
+++ b/init/main.c
@@ -451,6 +451,10 @@ void __init parse_early_param(void)
  * Activate the first processor.
  */
 
+#ifdef CONFIG_SMP
+int __boot_cpu_id;
+#endif
+
 static void __init boot_cpu_init(void)
 {
int cpu = smp_processor_id();
@@ -459,6 +463,10 @@ static void __init boot_cpu_init(void)
set_cpu_active(cpu, true);
set_cpu_present(cpu, true);
set_cpu_possible(cpu, true);
+
+#ifdef CONFIG_SMP
+   __boot_cpu_id = cpu;
+#endif
 }
 
 void __init __weak smp_setup_processor_id(void)
diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index 9656a3c36503..9da42f83ee03 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -22,14 +22,13 @@
 #include 
 #include 
 
-static DEFINE_RWLOCK(cpu_pm_notifier_lock);
-static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain);
+static ATOMIC_NOTIFIER_HEAD(cpu_pm_notifier_chain);
 
 static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int 
*nr_calls)
 {
int ret;
 
-   ret = __raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
+   ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
nr_to_call, nr_calls);
 
return notifier_to_errno(ret);
@@ -47,14 +46,7 @@ static int cpu_pm_notify(enum cpu_pm_event event, int 
nr_to_call, int *nr_calls)
  */
 int cpu_pm_register_notifier(struct notifier_block *nb)
 {
-   unsigned long flags;
-   int ret;
-
-   write_lock_irqsave(&cpu_pm_notifier_lock, flags);
-   ret = raw_notifier_chain_register(&cpu_pm_notifier_chain, nb);
-   write_unlock_irqrestore(&cpu_pm_notifier_lock, flags);
-
-   return ret;
+   return atomic_notifier_chain_register(&cpu_pm_notifier_chain, nb);
 }
 EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
 
@@ -69,14 +61,7 @@ EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
  */
 int cpu_pm_unregister_notifier(struct notifier_block *nb)
 {
-   unsigned long flags;
-   int ret;
-
-   write_lock_irqsave(&cpu_pm_notifier_lock, flags);
-   ret = raw_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
-   write_unlock_ir

[PATCH RT 2/6] rtmutex: Make lock_killable work

2017-08-07 Thread Julia Cartwright
4.1.42-rt50-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
From: Thomas Gleixner 

Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.

Use signal_pending_state() unconditionaly.

Cc: rt-sta...@vger.kernel.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 9edcb2cd71ff3684755f52129538260efa382789)
Signed-off-by: Julia Cartwright 
---
 kernel/locking/rtmutex.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index e0b0d9b419b5..3e45ceb862bd 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1511,18 +1511,13 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state,
if (try_to_take_rt_mutex(lock, current, waiter))
break;
 
-   /*
-* TASK_INTERRUPTIBLE checks for signals and
-* timeout. Ignored otherwise.
-*/
-   if (unlikely(state == TASK_INTERRUPTIBLE)) {
-   /* Signal pending? */
-   if (signal_pending(current))
-   ret = -EINTR;
-   if (timeout && !timeout->task)
-   ret = -ETIMEDOUT;
-   if (ret)
-   break;
+   if (timeout && !timeout->task) {
+   ret = -ETIMEDOUT;
+   break;
+   }
+   if (signal_pending_state(state, current)) {
+   ret = -EINTR;
+   break;
}
 
if (ww_ctx && ww_ctx->acquired > 0) {
-- 
2.13.1



[PATCH RT 0/6] Linux 4.1.42-rt50-rc1

2017-08-07 Thread Julia Cartwright
Dear RT Folks,

This is the RT stable review cycle of patch 4.1.42-rt50-rc1.  Please review
the included patches, and test!

The -rc release will be uploaded to kernel.org and will be deleted when the
final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 8/14/2017.

   Julia


To build 4.1.42-rt50-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.1.42.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.42-rt50-rc1.patch.xz

You can also build from 4.1.42-rt49 release by applying the incremental patch:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/incr/patch-4.1.42-rt49-rt50-rc1.patch.xz


Alex Shi (1):
  cpu_pm: replace raw_notifier to atomic_notifier

Julia Cartwright (1):
  Linux 4.1.42-rt50-rc1

Peter Zijlstra (2):
  lockdep: Fix per-cpu static objects
  sched: Remove TASK_ALL

Thomas Gleixner (2):
  rtmutex: Make lock_killable work
  sched: Prevent task state corruption by spurious lock wakeup

 include/linux/sched.h|  1 -
 include/linux/smp.h  | 12 
 init/main.c  |  8 
 kernel/cpu_pm.c  | 43 ++-
 kernel/locking/rtmutex.c | 19 +++
 kernel/module.c  |  6 +-
 kernel/sched/core.c  |  2 +-
 localversion-rt  |  2 +-
 mm/percpu.c  |  5 -
 9 files changed, 44 insertions(+), 54 deletions(-)

-- 
2.13.1



[PATCH RT 6/6] Linux 4.1.42-rt50-rc1

2017-08-07 Thread Julia Cartwright
4.1.42-rt50-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 4b7dca68a5b4..e8a9a36bb066 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt49
+-rt50-rc1
-- 
2.13.1



[PATCH RT 5/6] cpu_pm: replace raw_notifier to atomic_notifier

2017-08-07 Thread Julia Cartwright
4.1.42-rt50-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
From: Alex Shi 

This patch replace a rwlock and raw notifier by atomic notifier which
protected by spin_lock and rcu.

The first to reason to have this replace is due to a 'scheduling while
 atomic' bug of RT kernel on arm/arm64 platform. On arm/arm64, rwlock
cpu_pm_notifier_lock in cpu_pm cause a potential schedule after irq
disable in idle call chain:

cpu_startup_entry
  cpu_idle_loop
local_irq_disable()
cpuidle_idle_call
  call_cpuidle
cpuidle_enter
  cpuidle_enter_state
->enter :arm_enter_idle_state
  cpu_pm_enter/exit
CPU_PM_CPU_IDLE_ENTER
  read_lock(&cpu_pm_notifier_lock); <-- sleep in idle
 __rt_spin_lock();
schedule();

The kernel panic is here:
[4.609601] BUG: scheduling while atomic: swapper/1/0/0x0002
[4.609608] [] arm_enter_idle_state+0x18/0x70
[4.609614] Modules linked in:
[4.609615] [] cpuidle_enter_state+0xf0/0x218
[4.609620] [] cpuidle_enter+0x18/0x20
[4.609626] Preemption disabled at:
[4.609627] [] call_cpuidle+0x24/0x40
[4.609635] [] schedule_preempt_disabled+0x1c/0x28
[4.609639] [] cpu_startup_entry+0x154/0x1f8
[4.609645] [] secondary_start_kernel+0x15c/0x1a0

Daniel Lezcano said this notification is needed on arm/arm64 platforms.
Sebastian suggested using atomic_notifier instead of rwlock, which is not
only removing the sleeping in idle, but also getting better latency
improvement.

This patch passed Fengguang's 0day testing.

Signed-off-by: Alex Shi 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Cc: Anders Roxell 
Cc: Rik van Riel 
Cc: Steven Rostedt 
Cc: Rafael J. Wysocki 
Cc: Daniel Lezcano 
Cc: linux-rt-users 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit df0fba5ba4c69cdc68bdaa5ca7a4100d959fdd07)
Signed-off-by: Julia Cartwright 
---
 kernel/cpu_pm.c | 43 ++-
 1 file changed, 6 insertions(+), 37 deletions(-)

diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index 9656a3c36503..9da42f83ee03 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -22,14 +22,13 @@
 #include 
 #include 
 
-static DEFINE_RWLOCK(cpu_pm_notifier_lock);
-static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain);
+static ATOMIC_NOTIFIER_HEAD(cpu_pm_notifier_chain);
 
 static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int 
*nr_calls)
 {
int ret;
 
-   ret = __raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
+   ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
nr_to_call, nr_calls);
 
return notifier_to_errno(ret);
@@ -47,14 +46,7 @@ static int cpu_pm_notify(enum cpu_pm_event event, int 
nr_to_call, int *nr_calls)
  */
 int cpu_pm_register_notifier(struct notifier_block *nb)
 {
-   unsigned long flags;
-   int ret;
-
-   write_lock_irqsave(&cpu_pm_notifier_lock, flags);
-   ret = raw_notifier_chain_register(&cpu_pm_notifier_chain, nb);
-   write_unlock_irqrestore(&cpu_pm_notifier_lock, flags);
-
-   return ret;
+   return atomic_notifier_chain_register(&cpu_pm_notifier_chain, nb);
 }
 EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
 
@@ -69,14 +61,7 @@ EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
  */
 int cpu_pm_unregister_notifier(struct notifier_block *nb)
 {
-   unsigned long flags;
-   int ret;
-
-   write_lock_irqsave(&cpu_pm_notifier_lock, flags);
-   ret = raw_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
-   write_unlock_irqrestore(&cpu_pm_notifier_lock, flags);
-
-   return ret;
+   return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
 }
 EXPORT_SYMBOL_GPL(cpu_pm_unregister_notifier);
 
@@ -100,7 +85,6 @@ int cpu_pm_enter(void)
int nr_calls;
int ret = 0;
 
-   read_lock(&cpu_pm_notifier_lock);
ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
if (ret)
/*
@@ -108,7 +92,6 @@ int cpu_pm_enter(void)
 * PM entry who are notified earlier to prepare for it.
 */
cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
-   read_unlock(&cpu_pm_notifier_lock);
 
return ret;
 }
@@ -128,13 +111,7 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);
  */
 int cpu_pm_exit(void)
 {
-   int ret;
-
-   read_lock(&cpu_pm_notifier_lock);
-   ret = cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
-   read_unlock(&cpu_pm_notifier_lock);
-
-   return ret;
+   return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
 }
 EXPORT_SYMBOL_GPL(cpu_pm_exit);
 
@@ -159,7 +136,6 @@ int cpu_cluster_pm_enter(void)
int nr_calls;
int ret = 0;
 
-   read_lock(&cpu_pm_notifier_lock);
ret = cpu_pm

[PATCH RT 1/6] lockdep: Fix per-cpu static objects

2017-08-07 Thread Julia Cartwright
4.1.42-rt50-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
From: Peter Zijlstra 

Since commit 383776fa7527 ("locking/lockdep: Handle statically initialized
PER_CPU locks properly") we try to collapse per-cpu locks into a single
class by giving them all the same key. For this key we choose the canonical
address of the per-cpu object, which would be the offset into the per-cpu
area.

This has two problems:

 - there is a case where we run !0 lock->key through static_obj() and
   expect this to pass; it doesn't for canonical pointers.

 - 0 is a valid canonical address.

Cure both issues by redefining the canonical address as the address of the
per-cpu variable on the boot CPU.

Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at
all, track the boot CPU in a variable.

Fixes: 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU 
locks properly")
Reported-by: kernel test robot 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-by: Borislav Petkov 
Cc: Sebastian Andrzej Siewior 
Cc: linux...@kvack.org
Cc: w...@linux.intel.com
Cc: kernel test robot 
Cc: LKP 
Link: 
http://lkml.kernel.org/r/20170320114108.kbvcsuepem45j...@hirez.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner 
(cherry picked from commit c9fe9196079f738c89c3ffcdce3fbe142ac3f3c4)
Signed-off-by: Julia Cartwright 
---
 include/linux/smp.h | 12 
 init/main.c |  8 
 kernel/module.c |  6 +-
 mm/percpu.c |  5 -
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index e6ab36aeaaab..cbf6836524dc 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -120,6 +120,13 @@ extern unsigned int setup_max_cpus;
 extern void __init setup_nr_cpu_ids(void);
 extern void __init smp_init(void);
 
+extern int __boot_cpu_id;
+
+static inline int get_boot_cpu_id(void)
+{
+   return __boot_cpu_id;
+}
+
 #else /* !SMP */
 
 static inline void smp_send_stop(void) { }
@@ -158,6 +165,11 @@ static inline void smp_init(void) { up_late_init(); }
 static inline void smp_init(void) { }
 #endif
 
+static inline int get_boot_cpu_id(void)
+{
+   return 0;
+}
+
 #endif /* !SMP */
 
 /*
diff --git a/init/main.c b/init/main.c
index 0486a8e11fc0..e1bae15a2154 100644
--- a/init/main.c
+++ b/init/main.c
@@ -451,6 +451,10 @@ void __init parse_early_param(void)
  * Activate the first processor.
  */
 
+#ifdef CONFIG_SMP
+int __boot_cpu_id;
+#endif
+
 static void __init boot_cpu_init(void)
 {
int cpu = smp_processor_id();
@@ -459,6 +463,10 @@ static void __init boot_cpu_init(void)
set_cpu_active(cpu, true);
set_cpu_present(cpu, true);
set_cpu_possible(cpu, true);
+
+#ifdef CONFIG_SMP
+   __boot_cpu_id = cpu;
+#endif
 }
 
 void __init __weak smp_setup_processor_id(void)
diff --git a/kernel/module.c b/kernel/module.c
index a7ac858fd1a1..982c57b2c2a1 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -542,8 +542,12 @@ bool __is_module_percpu_address(unsigned long addr, 
unsigned long *can_addr)
void *va = (void *)addr;
 
if (va >= start && va < start + mod->percpu_size) {
-   if (can_addr)
+   if (can_addr) {
*can_addr = (unsigned long) (va - 
start);
+   *can_addr += (unsigned long)
+   per_cpu_ptr(mod->percpu,
+   get_boot_cpu_id());
+   }
preempt_enable();
return true;
}
diff --git a/mm/percpu.c b/mm/percpu.c
index 4146b00bfde7..b41c3960d5fb 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1297,8 +1297,11 @@ bool __is_kernel_percpu_address(unsigned long addr, 
unsigned long *can_addr)
void *va = (void *)addr;
 
if (va >= start && va < start + static_size) {
-   if (can_addr)
+   if (can_addr) {
*can_addr = (unsigned long) (va - start);
+   *can_addr += (unsigned long)
+   per_cpu_ptr(base, get_boot_cpu_id());
+   }
return true;
}
}
-- 
2.13.1



[PATCH RT 3/6] sched: Prevent task state corruption by spurious lock wakeup

2017-08-07 Thread Julia Cartwright
4.1.42-rt50-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
From: Thomas Gleixner 

Mathias and others reported GDB failures on RT.

The following scenario leads to task state corruption:

CPU0CPU1

T1->state = TASK_XXX;
spin_lock(&lock)
  rt_spin_lock_slowlock(&lock->rtmutex)
raw_spin_lock(&rtm->wait_lock);
T1->saved_state = current->state;
T1->state = TASK_UNINTERRUPTIBLE;
spin_unlock(&lock)
task_blocks_on_rt_mutex(rtm)  
rt_spin_lock_slowunlock(&lock->rtmutex)
  queue_waiter(rtm) 
raw_spin_lock(&rtm->wait_lock);
  pi_chain_walk(rtm)
raw_spin_unlock(&rtm->wait_lock);
wake_top_waiter(T1)

  raw_spin_lock(&rtm->wait_lock);

for (;;) {
  if (__try_to_take_rt_mutex())  <- Succeeds
break;
  ...
}

T1->state = T1->saved_state;
 try_to_wake_up(T1)
   ttwu_do_wakeup(T1)
 T1->state = 
TASK_RUNNING;

In most cases this is harmless because waiting for some event, which is the
usual reason for TASK_[UN]INTERRUPTIBLE has to be safe against other forms
of spurious wakeups anyway.

But in case of TASK_TRACED this is actually fatal, because the task loses
the TASK_TRACED state. In consequence it fails to consume SIGSTOP which was
sent from the debugger and actually delivers SIGSTOP to the task which
breaks the ptrace mechanics and brings the debugger into an unexpected
state.

The TASK_TRACED state should prevent getting there due to the state
matching logic in try_to_wake_up(). But that's not true because
wake_up_lock_sleeper() uses TASK_ALL as state mask. That's bogus because
lock sleepers always use TASK_UNINTERRUPTIBLE, so the wakeup should use
that as well.

The cure is way simpler as figuring it out:

Change the mask used in wake_up_lock_sleeper() from TASK_ALL to
TASK_UNINTERRUPTIBLE.

Cc: stable...@vger.kernel.org
Reported-by: Mathias Koehrer 
Reported-by: David Hauck 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 2f9f24e15088d2ef3244d088a9604d7e98c9c625)
Signed-off-by: Julia Cartwright 
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0d3a40b24304..ee11a59e53ff 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1876,7 +1876,7 @@ EXPORT_SYMBOL(wake_up_process);
  */
 int wake_up_lock_sleeper(struct task_struct *p)
 {
-   return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
+   return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
 }
 
 int wake_up_state(struct task_struct *p, unsigned int state)
-- 
2.13.1



[PATCH RT 4/6] sched: Remove TASK_ALL

2017-08-07 Thread Julia Cartwright
4.1.42-rt50-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
From: Peter Zijlstra 

It's unused:

$ git grep "\" | wc -l
1

And dangerous, kill the bugger.

Cc: stable...@vger.kernel.org
Acked-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit a1762267d95649bddf6e94e7e3305e0207d0fff0)
Signed-off-by: Julia Cartwright 
---
 include/linux/sched.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d51525ce2c41..7587d6181cd2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -227,7 +227,6 @@ extern char ___assert_task_state[1 - 2*!!(
 
 /* Convenience macros for the sake of wake_up */
 #define TASK_NORMAL(TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)
-#define TASK_ALL   (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED)
 
 /* get_task_state() */
 #define TASK_REPORT(TASK_RUNNING | TASK_INTERRUPTIBLE | \
-- 
2.13.1



Re: [PATCH] [RFC] tpm_tis: tpm_tcg_flush() after iowrite*()s

2017-08-07 Thread Julia Cartwright
On Fri, Aug 04, 2017 at 04:56:51PM -0500, Haris Okanovic wrote:
> I have a latency issue using a SPI-based TPM chip with tpm_tis driver
> from non-rt usermode application, which induces ~400 us latency spikes
> in cyclictest (Intel Atom E3940 system, PREEMPT_RT_FULL kernel).
>
> The spikes are caused by a stalling ioread8() operation, following a
> sequence of 30+ iowrite8()s to the same address. I believe this happens
> because the writes are cached (in cpu or somewhere along the bus), which
> gets flushed on the first LOAD instruction (ioread*()) that follows.

To use the ARM parlance, these accesses aren't "cached" (which would
imply that a result could be returned to the load from any intermediate
node in the interconnect), but instead are "bufferable".

It is really unfortunate that we continue to run into this class of
problem across various CPU vendors and various underlying bus
technologies; it's the continuing curse of running an PREEMPT_RT on
commodity hardware.  RT is not easy :)

> The enclosed change appears to fix this issue: read the TPM chip's
> access register (status code) after every iowrite*() operation.

Are we engaged in a game of wack-a-mole with all of the drivers which
use this same access pattern (of which I imagine there are quite a
few!)?

I'm wondering if we should explore the idea of adding a load in the
iowriteN()/writeX() macros (marking those accesses in which reads cause
side effects explicitly, redirecting to a _raw() variant or something).

Obviously that would be expensive for non-RT use cases, but for helping
constrain latency, it may be worth it for RT.

   Julia


signature.asc
Description: PGP signature


[ANNOUNCE] 4.1.42-rt49

2017-08-02 Thread Julia Cartwright
Hello RT Folks,

I'm happy to announce the 4.1.42-rt49 stable release.

This release is just an update to the new stable 4.1.42 version and no
RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.1-rt
  Head SHA1: 64fd80a3b0ef4db0d91b6fa224f49197745ef503


Or to build 4.1.42-rt49 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.1.42.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.42-rt49.patch.xz

Let me know if there are any problems!

   Julia


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 !//RE: kernel BUG at kernel/locking/rtmutex.c:1027

2017-06-27 Thread Julia Cartwright
On Tue, Jun 27, 2017 at 05:47:41AM +, Feng Feng24 Liu wrote:
> Hi, Julia
>   Thanks for your kindly hit. I will try this patch
>   The problem is accidental. I will try to reproduce it.
>   BTW, could you help to give the link about the emails which
>   discuss about " nsfs: mark dentry with DCACHE_RCUACCESS ". I
>   cannot find them(just patch email)

Conversation upstream started with a syzkaller-found problem here:

   https://marc.info/?l=linux-netdev&m=149183505630266&w=2

Hopefully that helps,
   Julia


signature.asc
Description: PGP signature


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 !//RE: kernel BUG at kernel/locking/rtmutex.c:1027

2017-06-26 Thread Julia Cartwright
On Mon, Jun 26, 2017 at 04:54:36PM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-26 10:24:18 [-0400], Steven Rostedt wrote:
> > >  CPU: 17 PID: 1738811 Comm: ip Not tainted 4.4.70-thinkcloud-nfv #1
> > >  Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01GR174, BIOS 
> > > -[TCE124M-2.10]- 06/23/2016
> > >  task: 881cda2c27c0 ti: 881ea0538000 task.ti: 881ea0538000
> > >  RIP: 0010:[]  [] 
> > > __try_to_take_rt_mutex+0x34/0x160
> > >  RSP: 0018:881ea053bb50  EFLAGS: 00010082
> > >  RAX:  RBX: 881f805416a8 RCX: 
> > >  RDX: 881ea053bb98 RSI: 881cda2c27c0 RDI: 881f805416a8
> > >  RBP: 881ea053bb60 R08: 0001 R09: 0002
> > >  R10: 0a01 R11: 0001 R12: 881cda2c27c0
> > >  R13: 881cda2c27c0 R14: 0202 R15: 881f6b0c27c0
> > >  FS:  7f28be315740() GS:88205f8c() 
> > > knlGS:
> > >  CS:  0010 DS:  ES:  CR0: 80050033
> > >  CR2: 0038 CR3: 001e9e479000 CR4: 003406e0
> > >  DR0:  DR1:  DR2: 
> > >  DR3:  DR6: fffe0ff0 DR7: 0400
> > >  Stack:
> > >   881f805416a8 881ea053bb98 881ea053bc28 81a8f03d
> > >   881ea053c000 01ff881ea053bb90 881cda2c27c0 881f6b0c27c1
> > >   881cda2c2eb0 0001  
> > >  Call Trace:
> > >   [] rt_spin_lock_slowlock+0x13d/0x390
> > >   [] rt_spin_lock+0x1f/0x30
> > >   [] lockref_get_not_dead+0xf/0x50
> > >   [] ns_get_path+0x61/0x1d0
> > 
> > Hmm, this is in the filesystem code. What were you doing when this
> > happened?
> 
> and do you have any patches except the RT patch?

This stack trace looks very familiar to an upstream use-after-free bug
fixed in v4.11 (commit 073c516ff735, "nsfs: mark dentry with
DCACHE_RCUACCESS", attached below), it's tagged for stable, but doesn't
look like it's trickled it's way back to 4.4.y yet.

Can you reproduce this problem reliably?  Can you confirm that the below
fixes it (it'll require some minor cajoling to apply cleanly)?

   Julia

-- 8< --
From: Cong Wang 
Date: Wed, 19 Apr 2017 15:11:00 -0700
Subject: [PATCH] nsfs: mark dentry with DCACHE_RCUACCESS

Andrey reported a use-after-free in __ns_get_path():

  spin_lock include/linux/spinlock.h:299 [inline]
  lockref_get_not_dead+0x19/0x80 lib/lockref.c:179
  __ns_get_path+0x197/0x860 fs/nsfs.c:66
  open_related_ns+0xda/0x200 fs/nsfs.c:143
  sock_ioctl+0x39d/0x440 net/socket.c:1001
  vfs_ioctl fs/ioctl.c:45 [inline]
  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
  SYSC_ioctl fs/ioctl.c:700 [inline]
  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691

We are under rcu read lock protection at that point:

rcu_read_lock();
d = atomic_long_read(&ns->stashed);
if (!d)
goto slow;
dentry = (struct dentry *)d;
if (!lockref_get_not_dead(&dentry->d_lockref))
goto slow;
rcu_read_unlock();

but don't use a proper RCU API on the free path, therefore a parallel
__d_free() could free it at the same time.  We need to mark the stashed
dentry with DCACHE_RCUACCESS so that __d_free() will be called after all
readers leave RCU.

Fixes: e149ed2b805f ("take the targets of /proc/*/ns/* symlinks to separate fs")
Cc: Alexander Viro 
Cc: Andrew Morton 
Reported-by: Andrey Konovalov 
Signed-off-by: Cong Wang 
Signed-off-by: Linus Torvalds 
---
 fs/nsfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 1656843e87d2..323f492e0822 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -91,6 +91,7 @@ slow:
return ERR_PTR(-ENOMEM);
}
d_instantiate(dentry, inode);
+   dentry->d_flags |= DCACHE_RCUACCESS;
dentry->d_fsdata = (void *)ns->ops;
d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
if (d) {
-- 
2.13.1


[ANNOUNCE] 4.1.40-rt48

2017-06-07 Thread Julia Cartwright
Hello RT Folks,

I'm happy to announce the 4.1.40-rt48 stable release.

This release is just an update to the new stable 4.1.40 version and no
RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.1-rt
  Head SHA1: a87c589ef7f10183dc648ef6be7e53eb3143c42e


Or to build 4.1.40-rt48 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.1.40.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.40-rt48.patch.xz

Let me know if there are any problems!

   Julia


[PATCH] md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock

2017-04-28 Thread Julia Cartwright
On mainline, there is no functional difference, just less code, and
symmetric lock/unlock paths.

On PREEMPT_RT builds, this fixes the following warning, seen by
Alexander GQ Gerasiov, due to the sleeping nature of spinlocks.

   BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:993
   in_atomic(): 0, irqs_disabled(): 1, pid: 58, name: kworker/u12:1
   CPU: 5 PID: 58 Comm: kworker/u12:1 Tainted: GW   
4.9.20-rt16-stand6-686 #1
   Hardware name: Supermicro SYS-5027R-WRF/X9SRW-F, BIOS 3.2a 10/28/2015
   Workqueue: writeback wb_workfn (flush-253:0)
   Call Trace:
dump_stack+0x47/0x68
? migrate_enable+0x4a/0xf0
___might_sleep+0x101/0x180
rt_spin_lock+0x17/0x40
add_stripe_bio+0x4e3/0x6c0 [raid456]
? preempt_count_add+0x42/0xb0
raid5_make_request+0x737/0xdd0 [raid456]

Reported-by: Alexander GQ Gerasiov 
Tested-by: Alexander GQ Gerasiov 
Signed-off-by: Julia Cartwright 
---
Hey All-

While this fixes a problem on RT primarily, the patch is equally applicable
upstream, as such probably makes sense to be pulled through the md tree.  It
may also make sense to be pulled directly into rt-devel.

Alexander-

I turned your "I confirm the fix" to a 'Tested-by', let me know if that's a 
problem.

Thanks,

   Julia

 drivers/md/raid5.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index fa2c4de32a64..54dc2995aeee 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -110,8 +110,7 @@ static inline void unlock_device_hash_lock(struct r5conf 
*conf, int hash)
 static inline void lock_all_device_hash_locks_irq(struct r5conf *conf)
 {
int i;
-   local_irq_disable();
-   spin_lock(conf->hash_locks);
+   spin_lock_irq(conf->hash_locks);
for (i = 1; i < NR_STRIPE_HASH_LOCKS; i++)
spin_lock_nest_lock(conf->hash_locks + i, conf->hash_locks);
spin_lock(&conf->device_lock);
@@ -121,9 +120,9 @@ static inline void unlock_all_device_hash_locks_irq(struct 
r5conf *conf)
 {
int i;
spin_unlock(&conf->device_lock);
-   for (i = NR_STRIPE_HASH_LOCKS; i; i--)
-   spin_unlock(conf->hash_locks + i - 1);
-   local_irq_enable();
+   for (i = NR_STRIPE_HASH_LOCKS - 1; i; i--)
+   spin_unlock(conf->hash_locks + i);
+   spin_unlock_irq(conf->hash_locks);
 }
 
 /* bio's attached to a stripe+device for I/O are linked together in bi_sector
@@ -732,12 +731,11 @@ static bool is_full_stripe_write(struct stripe_head *sh)
 
 static void lock_two_stripes(struct stripe_head *sh1, struct stripe_head *sh2)
 {
-   local_irq_disable();
if (sh1 > sh2) {
-   spin_lock(&sh2->stripe_lock);
+   spin_lock_irq(&sh2->stripe_lock);
spin_lock_nested(&sh1->stripe_lock, 1);
} else {
-   spin_lock(&sh1->stripe_lock);
+   spin_lock_irq(&sh1->stripe_lock);
spin_lock_nested(&sh2->stripe_lock, 1);
}
 }
@@ -745,8 +743,7 @@ static void lock_two_stripes(struct stripe_head *sh1, 
struct stripe_head *sh2)
 static void unlock_two_stripes(struct stripe_head *sh1, struct stripe_head 
*sh2)
 {
spin_unlock(&sh1->stripe_lock);
-   spin_unlock(&sh2->stripe_lock);
-   local_irq_enable();
+   spin_unlock_irq(&sh2->stripe_lock);
 }
 
 /* Only freshly new full stripe normal write stripe can be added to a batch 
list */
-- 
2.12.0



Re: [PATCH RT 1/1] remoteproc: Prevent schedule while atomic

2017-03-23 Thread Julia Cartwright
On Thu, Mar 23, 2017 at 10:26:49AM +, Lee Jones wrote:
> On Thu, 23 Mar 2017, Lionel DEBIEVE wrote:
> 
> > On 03/22/2017 07:47 PM, Julia Cartwright wrote:
> > > On Wed, Mar 22, 2017 at 01:30:12PM -0500, Grygorii Strashko wrote:
> > >> On 03/22/2017 01:01 PM, Steven Rostedt wrote:
> > >>> On Wed, 22 Mar 2017 12:37:59 -0500
> > >>> Julia Cartwright  wrote:
> > >>>
> > >>>> Which kernel were you testing on, here?  From what I can tell, this
> > >>>> should have been fixed with Thomas's commit:
> > >>>>
> > >>>> 2a1d3ab8986d ("genirq: Handle force threading of irqs with primary
> > >>>> and thread handler")
> > >>> Thanks Julia for looking into this. I just looked at the code, and saw
> > >>> that it does very little with the lock held, and was fine with the
> > >>> conversion. But if that interrupt handler should be in a thread, we
> > >>> should see if that's the issue first.
> > >>
> > >> It will not be threaded because there are IRQF_ONESHOT used.
> > >>
> > >>  ret = devm_request_threaded_irq(&pdev->dev, irq,
> > >>  sti_mbox_irq_handler,
> > >>  sti_mbox_thread_handler,
> > >>  IRQF_ONESHOT, mdev->name, mdev);
> > > Indeed.  I had skipped over this important detail when I was skimming
> > > through the code.
> > >
> > > Thanks for clarifying!
> > >
> > > Is IRQF_ONESHOT really necessary for this device?  The primary handler
> > > invokes sti_mbox_disable_channel() on the interrupting channel, which I
> > > would hope would acquiesce the pending interrupt at the device-level?
> 
> Not sure.  This part of the code is remanent from when I re-wrote it.
> 
> What is the alternative?

If, on the completed execution of the registered primary handler, you
can ensure that the device is no longer asserting an interrupt to the
connected irq chip, then the IRQF_ONESHOT isn't necessary, because it's
safe for the irq core to unmask the interrupt after the primary handler
runs.

It appears that it might be able to make this guarantee, if that's what
sti_mbox_disable_channel() is doing.

> NB: What does 'acquiesce' mean in this context?  Is that a typo?

I mean 'acquiesce' to mean what I mention before: prevent the device
from asserting the interrupt.  Perhaps it's a uncommon use of the word.

> > > Also, as written there are num_inst reads of STI_IRQ_VAL_OFFSET in the
> > > primary handler, which seems inefficient...(unless of course reading
> > > incurs side effects, here).
>
> Inefficient in what respect?

I've since looked again and realized that the register base is
recalculated based on 'instance', so disregard this comment.

   Julia


signature.asc
Description: PGP signature


Re: [PATCH v4 1/4] pinctrl: rockchip: remove unnecessary locking

2017-03-23 Thread Julia Cartwright
On Thu, Mar 23, 2017 at 09:01:43PM +0100, Heiko St?bner wrote:
> Am Donnerstag, 23. März 2017, 13:29:10 CET schrieb Julia Cartwright:
> > On Thu, Mar 23, 2017 at 06:55:50PM +0100, Heiko St?bner wrote:
> > > Am Donnerstag, 23. März 2017, 17:51:53 CET schrieb John Keeping:
> > > > On Thu, 23 Mar 2017 11:10:20 -0500, Julia Cartwright wrote:
[..]
> > > > > > @@ -1185,17 +1177,14 @@ static int rockchip_set_drive_perpin(struct
> > > > > > rockchip_pin_bank *bank,> >
> > > > > > 
> > > > > > rmask = BIT(15) | BIT(31);
> > > > > > data |= BIT(31);
> > > > > > ret = regmap_update_bits(regmap, reg, rmask, 
> > > > > > data);
> > > > > > 
> > > > > > -   if (ret) {
> > > > > > -   spin_unlock_irqrestore(&bank->slock, 
> > > > > > flags);
> > > > > > +   if (ret)
> > > > > > 
> > > > > > return ret;
> > > > > > 
> > > > > > -   }
> > > > > > 
> > > > > > rmask = 0x3 | (0x3 << 16);
> > > > > > temp |= (0x3 << 16);
> > > > > > reg += 0x4;
> > > > > > ret = regmap_update_bits(regmap, reg, rmask, 
> > > > > > temp);
> > > > > 
> > > > > Killing the lock here means the writes to to this pair of registers
> > > > > (reg
> > > > > and reg + 4) can be observed non-atomically.  Have you convinced
> > > > > yourself that this isn't a problem?
> > > > 
> > > > I called it out in v1 [1] since this bit is new since v4.4 where I
> > > > originally wrote this patch, and didn't get any comments about it.
> > > > 
> > > > I've convinced myself that removing the lock doesn't cause any problems
> > > > for writing to the hardware: if the lock would prevent writes
> > > > interleaving then it means that two callers are trying to write
> > > > different drive strengths to the same pin, and even with a lock here one
> > > > of them will end up with the wrong drive strength.
> > > > 
> > > > But it does mean that a read via rockchip_get_drive_perpin() may see an
> > > > inconsistent state.  I think adding a new lock specifically for this
> > > > particular drive strength bit is overkill and I can't find a scenario
> > > > where this will actually matter; any driver setting a pinctrl config
> > > > must already be doing something to avoid racing two configurations
> > > > against each other, mustn't it?
> > > 
> > > also, pins can normally only be requested once - see drivers complaining
> > > if
> > > one of their pins is already held by a different driver. So if you really
> > > end up with two things writing to the same drive strength bits, the
> > > driver holding the pins must be really messed up anyway :-)
> > 
> > My concern would be if two independent pins' drive strength
> > configuration would walk on each other, because they happen to be
> > configured via the same registers.
> > 
> > If that's not possible, then great!
> 
> ah sorry that we didn't make that clearer in the beginning, but no that also 
> isn't possible. The registers use a "hiword-mask" scheme, so on each write to 
> bit (x) the corresponding write-mask in bit (x+16) also needs to be set, so 
> the each write will always only affect bits enabled in that way.

Awesome, thanks for clearing things up.

Thanks!
   Julia


signature.asc
Description: PGP signature


  1   2   >