Re: [PATCH] fsnotify: don't call mutex_lock from TASK_INTERRUPTIBLE context
On Sat, 1 Nov 2014 23:51:38 -0400 Sasha Levin wrote: > Sleeping functions should only be called from TASK_RUNNING. The following > code in fanotify_read(): > > prepare_to_wait(>notification_waitq, , TASK_INTERRUPTIBLE); > > mutex_lock(>notification_mutex); > > would call it under TASK_INTERRUPTIBLE, and trigger a warning: > > [12326.092094] WARNING: CPU: 27 PID: 30207 at kernel/sched/core.c:7305 > __might_sleep+0xd2/0x110() > [12326.092878] do not call blocking ops when !TASK_RUNNING; state=1 set at > prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) > [12326.093938] Modules linked in: > > ... > It's a fairly minor problem - if mutex_lock() hits contention we get flipped into TASK_RUNNING and the schedule() immediately returns and we take another trip around the loop. fanotify_read() also calls copy_event_to_user()->copy_to_user() in TASK_INTERRUPTIBLE state. That's a bug and this is why the first thing handle_mm_fault() does is to set TASK_RUNNING. > Instead of trying to fix fanotify_read() I've converted > notification_mutex into a spinlock. I didn't see a reason why it > should be a mutex nor anything complained when I ran the same tests > again. This could be a latency problem - those lists can get very long. I wonder if we can zap the prepare_to_wait()/finish_wait() and use something like wait_event_interruptible(notification_waitq, foo(group, count)); int foo(struct fsnotify_group *group, size_t count) { int ret; mutex_lock(>notification_mutex); ret = get_one_event(group, count); mutex_unlock(>notification_mutex); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fsnotify: don't call mutex_lock from TASK_INTERRUPTIBLE context
On Sat, 1 Nov 2014 23:51:38 -0400 Sasha Levin sasha.le...@oracle.com wrote: Sleeping functions should only be called from TASK_RUNNING. The following code in fanotify_read(): prepare_to_wait(group-notification_waitq, wait, TASK_INTERRUPTIBLE); mutex_lock(group-notification_mutex); would call it under TASK_INTERRUPTIBLE, and trigger a warning: [12326.092094] WARNING: CPU: 27 PID: 30207 at kernel/sched/core.c:7305 __might_sleep+0xd2/0x110() [12326.092878] do not call blocking ops when !TASK_RUNNING; state=1 set at prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.093938] Modules linked in: ... It's a fairly minor problem - if mutex_lock() hits contention we get flipped into TASK_RUNNING and the schedule() immediately returns and we take another trip around the loop. fanotify_read() also calls copy_event_to_user()-copy_to_user() in TASK_INTERRUPTIBLE state. That's a bug and this is why the first thing handle_mm_fault() does is to set TASK_RUNNING. Instead of trying to fix fanotify_read() I've converted notification_mutex into a spinlock. I didn't see a reason why it should be a mutex nor anything complained when I ran the same tests again. This could be a latency problem - those lists can get very long. I wonder if we can zap the prepare_to_wait()/finish_wait() and use something like wait_event_interruptible(notification_waitq, foo(group, count)); int foo(struct fsnotify_group *group, size_t count) { int ret; mutex_lock(group-notification_mutex); ret = get_one_event(group, count); mutex_unlock(group-notification_mutex); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fsnotify: don't call mutex_lock from TASK_INTERRUPTIBLE context
Sleeping functions should only be called from TASK_RUNNING. The following code in fanotify_read(): prepare_to_wait(>notification_waitq, , TASK_INTERRUPTIBLE); mutex_lock(>notification_mutex); would call it under TASK_INTERRUPTIBLE, and trigger a warning: [12326.092094] WARNING: CPU: 27 PID: 30207 at kernel/sched/core.c:7305 __might_sleep+0xd2/0x110() [12326.092878] do not call blocking ops when !TASK_RUNNING; state=1 set at prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.093938] Modules linked in: [12326.094261] CPU: 27 PID: 30207 Comm: fanotify01 Not tainted 3.18.0-rc2-next-20141031-sasha-00057-g9a0b11b-dirty #1435 [12326.095255] 0009 88003b563000 88005bfbbc38 [12326.096019] 90dabf13 88005bfbbc98 88005bfbbc88 [12326.096791] 8c1b12fa 88005bfbbc88 8c1f6112 001d76c0 [12326.097610] Call Trace: [12326.097881] dump_stack (lib/dump_stack.c:52) [12326.098383] warn_slowpath_common (kernel/panic.c:432) [12326.098973] ? __might_sleep (kernel/sched/core.c:7311) [12326.099512] ? prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.100100] warn_slowpath_fmt (kernel/panic.c:446) [12326.100704] ? check_chain_key (kernel/locking/lockdep.c:2190) [12326.101319] ? prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.101870] ? prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.102421] __might_sleep (kernel/sched/core.c:7311) [12326.102949] ? prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.103502] ? prepare_to_wait (kernel/sched/wait.c:181) [12326.104060] mutex_lock_nested (kernel/locking/mutex.c:623) [12326.104620] ? preempt_count_sub (kernel/sched/core.c:2641) [12326.105324] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:161 kernel/locking/spinlock.c:191) [12326.105986] ? prepare_to_wait (kernel/sched/wait.c:181) [12326.106542] fanotify_read (./arch/x86/include/asm/atomic.h:27 include/linux/mutex.h:131 fs/notify/fanotify/fanotify_user.c:57 fs/notify/fanotify/fanotify_user.c:273) [12326.107070] ? abort_exclusive_wait (kernel/sched/wait.c:291) [12326.107676] vfs_read (fs/read_write.c:430) [12326.108169] SyS_read (fs/read_write.c:569 fs/read_write.c:562) [12326.108652] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) Instead of trying to fix fanotify_read() I've converted notification_mutex into a spinlock. I didn't see a reason why it should be a mutex nor anything complained when I ran the same tests again. Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 18 +- fs/notify/group.c |2 +- fs/notify/inotify/inotify_user.c | 16 fs/notify/notification.c | 22 +++--- include/linux/fsnotify_backend.h |2 +- 5 files changed, 30 insertions(+), 30 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index c991616..f03bffc 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -49,12 +49,12 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; * enough to fit in "count". Return an error pointer if the count * is not large enough. * - * Called with the group->notification_mutex held. + * Called with the group->notification_lock held. */ static struct fsnotify_event *get_one_event(struct fsnotify_group *group, size_t count) { - BUG_ON(!mutex_is_locked(>notification_mutex)); + BUG_ON(!spin_is_locked(>notification_lock)); pr_debug("%s: group=%p count=%zd\n", __func__, group, count); @@ -64,7 +64,7 @@ static struct fsnotify_event *get_one_event(struct fsnotify_group *group, if (FAN_EVENT_METADATA_LEN > count) return ERR_PTR(-EINVAL); - /* held the notification_mutex the whole time, so this is the + /* held the notification_lock the whole time, so this is the * same event we peeked above */ return fsnotify_remove_first_event(group); } @@ -244,10 +244,10 @@ static unsigned int fanotify_poll(struct file *file, poll_table *wait) int ret = 0; poll_wait(file, >notification_waitq, wait); - mutex_lock(>notification_mutex); + spin_lock(>notification_lock); if (!fsnotify_notify_queue_is_empty(group)) ret = POLLIN | POLLRDNORM; - mutex_unlock(>notification_mutex); + spin_unlock(>notification_lock); return ret; } @@ -269,9 +269,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf, while (1) { prepare_to_wait(>notification_waitq, , TASK_INTERRUPTIBLE); - mutex_lock(>notification_mutex); +
[PATCH] fsnotify: don't call mutex_lock from TASK_INTERRUPTIBLE context
Sleeping functions should only be called from TASK_RUNNING. The following code in fanotify_read(): prepare_to_wait(group-notification_waitq, wait, TASK_INTERRUPTIBLE); mutex_lock(group-notification_mutex); would call it under TASK_INTERRUPTIBLE, and trigger a warning: [12326.092094] WARNING: CPU: 27 PID: 30207 at kernel/sched/core.c:7305 __might_sleep+0xd2/0x110() [12326.092878] do not call blocking ops when !TASK_RUNNING; state=1 set at prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.093938] Modules linked in: [12326.094261] CPU: 27 PID: 30207 Comm: fanotify01 Not tainted 3.18.0-rc2-next-20141031-sasha-00057-g9a0b11b-dirty #1435 [12326.095255] 0009 88003b563000 88005bfbbc38 [12326.096019] 90dabf13 88005bfbbc98 88005bfbbc88 [12326.096791] 8c1b12fa 88005bfbbc88 8c1f6112 001d76c0 [12326.097610] Call Trace: [12326.097881] dump_stack (lib/dump_stack.c:52) [12326.098383] warn_slowpath_common (kernel/panic.c:432) [12326.098973] ? __might_sleep (kernel/sched/core.c:7311) [12326.099512] ? prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.100100] warn_slowpath_fmt (kernel/panic.c:446) [12326.100704] ? check_chain_key (kernel/locking/lockdep.c:2190) [12326.101319] ? prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.101870] ? prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.102421] __might_sleep (kernel/sched/core.c:7311) [12326.102949] ? prepare_to_wait (./arch/x86/include/asm/current.h:14 kernel/sched/wait.c:179) [12326.103502] ? prepare_to_wait (kernel/sched/wait.c:181) [12326.104060] mutex_lock_nested (kernel/locking/mutex.c:623) [12326.104620] ? preempt_count_sub (kernel/sched/core.c:2641) [12326.105324] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:161 kernel/locking/spinlock.c:191) [12326.105986] ? prepare_to_wait (kernel/sched/wait.c:181) [12326.106542] fanotify_read (./arch/x86/include/asm/atomic.h:27 include/linux/mutex.h:131 fs/notify/fanotify/fanotify_user.c:57 fs/notify/fanotify/fanotify_user.c:273) [12326.107070] ? abort_exclusive_wait (kernel/sched/wait.c:291) [12326.107676] vfs_read (fs/read_write.c:430) [12326.108169] SyS_read (fs/read_write.c:569 fs/read_write.c:562) [12326.108652] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) Instead of trying to fix fanotify_read() I've converted notification_mutex into a spinlock. I didn't see a reason why it should be a mutex nor anything complained when I ran the same tests again. Signed-off-by: Sasha Levin sasha.le...@oracle.com --- fs/notify/fanotify/fanotify_user.c | 18 +- fs/notify/group.c |2 +- fs/notify/inotify/inotify_user.c | 16 fs/notify/notification.c | 22 +++--- include/linux/fsnotify_backend.h |2 +- 5 files changed, 30 insertions(+), 30 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index c991616..f03bffc 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -49,12 +49,12 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; * enough to fit in count. Return an error pointer if the count * is not large enough. * - * Called with the group-notification_mutex held. + * Called with the group-notification_lock held. */ static struct fsnotify_event *get_one_event(struct fsnotify_group *group, size_t count) { - BUG_ON(!mutex_is_locked(group-notification_mutex)); + BUG_ON(!spin_is_locked(group-notification_lock)); pr_debug(%s: group=%p count=%zd\n, __func__, group, count); @@ -64,7 +64,7 @@ static struct fsnotify_event *get_one_event(struct fsnotify_group *group, if (FAN_EVENT_METADATA_LEN count) return ERR_PTR(-EINVAL); - /* held the notification_mutex the whole time, so this is the + /* held the notification_lock the whole time, so this is the * same event we peeked above */ return fsnotify_remove_first_event(group); } @@ -244,10 +244,10 @@ static unsigned int fanotify_poll(struct file *file, poll_table *wait) int ret = 0; poll_wait(file, group-notification_waitq, wait); - mutex_lock(group-notification_mutex); + spin_lock(group-notification_lock); if (!fsnotify_notify_queue_is_empty(group)) ret = POLLIN | POLLRDNORM; - mutex_unlock(group-notification_mutex); + spin_unlock(group-notification_lock); return ret; } @@ -269,9 +269,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf, while (1) { prepare_to_wait(group-notification_waitq, wait, TASK_INTERRUPTIBLE); -