[Devel] [PATCH RHEL7 COMMIT] ms/new helper: wait_event_killable_exclusive()
The commit is pushed to "branch-rh7-3.10.0-327.36.1.vz7.19.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.36.1.vz7.19.1 --> commit 33cba27d15135e157617e8ae9147baf9925f5e99 Author: Stanislav KinsburskiyDate: Sat Oct 15 01:49:04 2016 +0400 ms/new helper: wait_event_killable_exclusive() Patchset description: fuse: fix signals handling while processing request This patch fixes wrong SIGBUS result in page fault handler for fuse file, when process received a signal. https://jira.sw.ru/browse/PSBM-53581 Stanislav Kinsburskiy (2): new helper: wait_event_killable_exclusive() fuse: handle only fatal signals while waiting request answer = This patch description: Backport of ms commit 6a0fb306738994d6f091791aeb11a5dc87ad8f4c ("new helper: wait_event_killable_exclusive()"). Signed-off-by: Al Viro Signed-off-by: Stanislav Kinsburskiy Acked-by: Maxim Patlasov --- include/linux/wait.h | 26 ++ 1 file changed, 26 insertions(+) diff --git a/include/linux/wait.h b/include/linux/wait.h index 65da9e3..8475f2d 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -710,6 +710,32 @@ do { \ __ret; \ }) +#define __wait_event_killable_exclusive(wq, condition, ret)\ +do { \ + DEFINE_WAIT(__wait);\ + \ + for (;;) { \ + prepare_to_wait_exclusive(, &__wait, TASK_KILLABLE); \ + if (condition) \ + break; \ + if (!fatal_signal_pending(current)) { \ + schedule(); \ + continue; \ + } \ + ret = -ERESTARTSYS; \ + break; \ + } \ + finish_wait(, &__wait); \ +} while (0) + + +#define wait_event_killable_exclusive(wq, condition) \ +({ \ + int __ret = 0; \ + if (!(condition)) \ + __wait_event_killable_exclusive(wq, condition, __ret); \ + __ret; \ +}) #define __wait_event_lock_irq(wq, condition, lock, cmd) \ do { \ ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ms/fuse: handle only fatal signals while waiting request answer
The commit is pushed to "branch-rh7-3.10.0-327.36.1.vz7.19.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.36.1.vz7.19.1 --> commit 1a3de6325f248eae8dfd527617fd33a5aaeea449 Author: Stanislav KinsburskiyDate: Sat Oct 15 01:49:05 2016 +0400 ms/fuse: handle only fatal signals while waiting request answer Patchset description: fuse: fix signals handling while processing request This patch fixes wrong SIGBUS result in page fault handler for fuse file, when process received a signal. Stanislav Kinsburskiy (2): new helper: wait_event_killable_exclusive() fuse: handle only fatal signals while waiting request answer = This patch description: This patch is backport of ms commit 7d3a07fcb8a0d5c06718de14fb91fdf1ef20a0e2 ("fuse: don't mess with blocking signals"). just use wait_event_killable{,_exclusive}(). Signed-off-by: Al Viro Although this commit loks more like cleanup, it's not. It fixes the issue with signals, which can lead to abort of page read in fuse, called from page fault, immediatly leading to SIGBUS sent to the caller. The issue is in block_sigs() implementation: it doesn't drop pending signal flag, which interrupts wait_event_interruptible() in request_wait_answer(), while it's supposed to be interrupted by SIGKILL only. IOW, any signal, arrived to the process, which does page fault handling on fuse file, _before_ request_wait_answer() is called, will lead to request interruption, producing SIGBUS error in page fault handler (filemap_fault). Once again: block_sigs() doesn't (!) clear TIG_SIGPENDING flag. All it does is blocking future signals to arrive. Moreover, __set_task_blocked() call recalc_sigpending(), which check whether any of the signals to block is present in process pending mask, and if so - set (!) TIF_SIGPENDING on the task. IOW, any pending signal remains pending after call to blocks_sigs(). And that's is the root of the issue. https://jira.sw.ru/browse/PSBM-53581 Signed-off-by: Stanislav Kinsburskiy Acked-by: Maxim Patlasov --- fs/fuse/dev.c | 42 -- 1 file changed, 16 insertions(+), 26 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index a0d0b1a..0b6d000 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -99,19 +99,6 @@ void fuse_request_free(struct fuse_req *req) kmem_cache_free(fuse_req_cachep, req); } -static void block_sigs(sigset_t *oldset) -{ - sigset_t mask; - - siginitsetinv(, sigmask(SIGKILL)); - sigprocmask(SIG_BLOCK, , oldset); -} - -static void restore_sigs(sigset_t *oldset) -{ - sigprocmask(SIG_SETMASK, oldset, NULL); -} - void __fuse_get_request(struct fuse_req *req) { atomic_inc(>count); @@ -144,15 +131,9 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages, atomic_inc(>num_waiting); if (fuse_block_alloc(fc, for_background)) { - sigset_t oldset; - int intr; - - block_sigs(); - intr = wait_event_interruptible_exclusive(fc->blocked_waitq, - !fuse_block_alloc(fc, for_background)); - restore_sigs(); err = -EINTR; - if (intr) + if (wait_event_killable_exclusive(fc->blocked_waitq, + !fuse_block_alloc(fc, for_background))) goto out; } @@ -412,6 +393,19 @@ __acquires(fc->lock) spin_lock(>lock); } +static void wait_answer_killable(struct fuse_conn *fc, +struct fuse_req *req) +__releases(fc->lock) +__acquires(fc->lock) +{ + if (fatal_signal_pending(current)) + return; + + spin_unlock(>lock); + wait_event_killable(req->waitq, req->state == FUSE_REQ_FINISHED); + spin_lock(>lock); +} + static void queue_interrupt(struct fuse_conn *fc, struct fuse_req *req) { list_add_tail(>intr_entry, >interrupts); @@ -438,12 +432,8 @@ __acquires(fc->lock) } if (!req->force) { - sigset_t oldset; - /* Only fatal signals may interrupt this */ - block_sigs(); - wait_answer_interruptible(fc, req); - restore_sigs(); + wait_answer_killable(fc, req); if (req->aborted) goto aborted; ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH 0/2] fuse: fix signals handling while processing request
14.10.2016 02:23, Maxim Patlasov пишет: Stas, The series look fine, so: Acked-by: Maxim PatlasovBut, please, refine the description of the second patch. It must explain clearly why the patch fixes the problem: block_sigs() blocks ordinary non-fatal signals as expected, but surprisingly SIGTRAP is special: it does not matter whether it comes before or after block_sigs(), the latter does not affect SIGTRAP at all! And in contrast, wait_event_killable() is immune to it -- only fatal sig can wake it up. No, Maxim. You make a mistake here. There is nothing special with SIGTRAP (although it's being sometimes sent via force_sig_info()). The problem is described as it is: block_sigs() doesn't (!) clear TIG_SIGPENDING flag. All it does is blocking future signals to arrive. Moreover, __set_task_blocked() call recalc_sigpending(), which check whether any of the signals to block is present in process pending mask, and if so - set (!) TIF_SIGPENDING on the task. IOW, any pending signal remains pending after call to blocks_sigs(). And that's is the root of the issue (as it described in the patch comment). Thanks, Maxim On 10/13/2016 03:03 AM, Stanislav Kinsburskiy wrote: This patch fixes wrong SIGBUS result in page fault handler for fuse file, when process received a signal. https://jira.sw.ru/browse/PSBM-53581 --- Stanislav Kinsburskiy (2): new helper: wait_event_killable_exclusive() fuse: handle only fatal signals while waiting request answer fs/fuse/dev.c| 42 -- include/linux/wait.h | 26 ++ 2 files changed, 42 insertions(+), 26 deletions(-) -- ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 1/4] ms/kernel: add kcov code coverage
From: Dmitry Vyukovkcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [a...@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [a...@linux-foundation.org: unbreak allmodconfig] [a...@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov Reviewed-by: Kees Cook Cc: syzkaller Cc: Vegard Nossum Cc: Catalin Marinas Cc: Tavis Ormandy Cc: Will Deacon Cc: Quentin Casasnovas Cc: Kostya Serebryany Cc: Eric Dumazet Cc: Alexander Potapenko Cc: Kees Cook Cc: Bjorn Helgaas Cc: Sasha Levin Cc: David Drysdale Cc: Ard Biesheuvel Cc: Andrey Ryabinin Cc: Kirill A. Shutemov Cc: Jiri Slaby Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry-picked from 5c9a8750a6409c63a0f01d51a9024861022f6593) Signed-off-by: Andrey Ryabinin --- Documentation/kcov.txt| 111 +++ Makefile | 11 +- arch/x86/Kconfig | 1 + arch/x86/boot/Makefile| 7 + arch/x86/boot/compressed/Makefile | 3 + arch/x86/kernel/Makefile | 6 + arch/x86/kernel/apic/Makefile | 4 + arch/x86/kernel/cpu/Makefile | 4 + arch/x86/lib/Makefile | 3 + arch/x86/mm/Makefile | 3 + arch/x86/realmode/rm/Makefile | 3 + arch/x86/vdso/Makefile| 1 + include/linux/sched.h | 11 ++ kernel/Makefile | 12 ++ kernel/exit.c | 2 + kernel/fork.c | 3 + kernel/kcov.c | 274 ++ kernel/sched/Makefile | 4 + lib/Kconfig.debug | 21 +++ lib/Makefile | 12 ++ mm/Makefile | 15 +++ mm/kasan/Makefile | 2 + scripts/Makefile.lib | 6 + 23 files changed, 518 insertions(+), 1 deletion(-) create mode
[Devel] [PATCH rh7 4/4] ms/kcov: properly check if we are in an interrupt
From: Andrey Konovalovin_interrupt() returns a nonzero value when we are either in an interrupt or have bh disabled via local_bh_disable(). Since we are interested in only ignoring coverage from actual interrupts, do a proper check instead of just calling in_interrupt(). Link: http://lkml.kernel.org/r/1476115803-20712-1-git-send-email-andreyk...@google.com Signed-off-by: Andrey Konovalov Acked-by: Dmitry Vyukov Cc: Nicolai Stange Cc: Andrey Ryabinin Cc: Kees Cook Cc: James Morse Cc: Vegard Nossum Cc: Quentin Casasnovas Signed-off-by: Andrew Morton Signed-off-by: Andrey Ryabinin --- kernel/kcov.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/kcov.c b/kernel/kcov.c index 91b00e6..83b50fe 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -54,8 +55,15 @@ void notrace __sanitizer_cov_trace_pc(void) /* * We are interested in code coverage as a function of a syscall inputs, * so we ignore code executed in interrupts. +* The checks for whether we are in an interrupt are open-coded, because +* 1. We can't use in_interrupt() here, since it also returns true +*when we are inside local_bh_disable() section. +* 2. We don't want to use (in_irq() | in_serving_softirq() | in_nmi()), +*since that leads to slower generated code (three separate tests, +*one for each of the flags). */ - if (!t || in_interrupt()) + if (!t || (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET + | NMI_MASK))) return; mode = READ_ONCE(t->kcov_mode); if (mode == KCOV_MODE_TRACE) { -- 2.7.3 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 2/4] ms/kcov: don't trace the code coverage code
From: James MorseKcov causes the compiler to add a call to __sanitizer_cov_trace_pc() in every basic block. Ftrace patches in a call to _mcount() to each function it has annotated. Letting these mechanisms annotate each other is a bad thing. Break the loop by adding 'notrace' to __sanitizer_cov_trace_pc() so that ftrace won't try to patch this code. This patch lets arm64 with KCOV and STACK_TRACER boot. Signed-off-by: James Morse Acked-by: Dmitry Vyukov Cc: Alexander Potapenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit bdab42dfc974d15303afbf259f340f374a453974) Signed-off-by: Andrey Ryabinin --- kernel/kcov.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kcov.c b/kernel/kcov.c index 9ea7a05..f963fca 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -44,7 +44,7 @@ struct kcov { * Entry point from instrumented code. * This is called once per basic-block/edge. */ -void __sanitizer_cov_trace_pc(void) +void notrace __sanitizer_cov_trace_pc(void) { struct task_struct *t; enum kcov_mode mode; -- 2.7.3 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ext4: Discard preallocated block before swap_extents
The commit is pushed to "branch-rh7-3.10.0-327.36.1.vz7.19.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.36.1.vz7.19.1 --> commit adb6ff95974228cb22b72ab950504d46586196ac Author: Dmitry MonakhovDate: Sat Oct 15 02:14:03 2016 +0400 ext4: Discard preallocated block before swap_extents Inode preallocation consists of two parts (used and unused) fully controlled by inode, so it must be discarded before swap extents. Currently we may skip drop_preallocation if file is sparse. This patch does: - Moves ext4_discard_preallocations to ext4_swap_extents. This makes more readable and reliable for future changes. - Cleanup main move_extent loop xfstests:ext4/024 (pended: https://github.com/dmonakhov/xfstests/commit/7a4763963f73ea5d5bba45eefa484494aa3df7cf) Signed-off-by: Dmitry Monakhov Reviewed-by: Maxim Patlasov khorenko@: v2: return changed moved_len only in case ioctl succeeds, otherwise leave it intact. --- fs/ext4/extents.c | 2 ++ fs/ext4/move_extent.c | 20 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 71b4b620..381cd54 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5574,9 +5574,11 @@ ext4_swap_extents(handle_t *handle, struct inode *inode1, BUG_ON(!mutex_is_locked(>i_mutex)); BUG_ON(!mutex_is_locked(>i_mutex)); + ext4_discard_preallocations(inode1); *erp = ext4_es_remove_extent(inode1, lblk1, count); if (unlikely(*erp)) return 0; + ext4_discard_preallocations(inode2); *erp = ext4_es_remove_extent(inode2, lblk2, count); if (unlikely(*erp)) return 0; diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index cdf5017..5029de7 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -551,6 +551,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk, ext4_lblk_t o_end, o_start = orig_blk; ext4_lblk_t d_start = donor_blk; int ret; + __u64 m_len = *moved_len; if (orig_inode->i_sb != donor_inode->i_sb) { ext4_debug("ext4 move extent: The argument files " @@ -607,7 +608,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk, ret = get_ext_path(orig_inode, o_start, ); if (ret) - goto out; + break; ex = path[path->p_depth].p_ext; next_blk = ext4_ext_next_allocated_block(path); cur_blk = le32_to_cpu(ex->ee_block); @@ -617,7 +618,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk, if (next_blk == EXT_MAX_BLOCKS) { o_start = o_end; ret = -ENODATA; - goto out; + break; } d_start += next_blk - o_start; o_start = next_blk; @@ -629,7 +630,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk, o_start = cur_blk; /* Extent inside requested range ?*/ if (cur_blk >= o_end) - goto out; + break; } else { /* in_range(o_start, o_blk, o_len) */ cur_len += cur_blk - o_start; } @@ -662,17 +663,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk, break; o_start += cur_len; d_start += cur_len; + m_len += cur_len; } - *moved_len = o_start - orig_blk; - if (*moved_len > len) - *moved_len = len; - out: - if (*moved_len) { - ext4_discard_preallocations(orig_inode); - ext4_discard_preallocations(donor_inode); - } - + WARN_ON(m_len > len); + if (ret == 0) + *moved_len = m_len; ext4_ext_drop_refs(path); kfree(path); ext4_double_up_write_data_sem(orig_inode, donor_inode); ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH] ext4: fix mkdir operations with overlayfs
ext4 supports an extended operations like rename2, but inode isn't correctly marked after mkdir. Signed-off-by: Alexey Lyashkov> --- fs/ext4/namei.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 0adc6df..bebe698 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2413,6 +2413,7 @@ retry: inode->i_op = _dir_inode_operations.ops; inode->i_fop = _dir_operations; + inode->i_flags |= S_IOPS_WRAPPER; err = ext4_init_new_dir(handle, dir, inode); if (err) goto out_clear_inode; -- 1.8.3.1 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 3/4] ms/kcov: don't profile branches in kcov
Profiling 'if' statements in __sanitizer_cov_trace_pc() leads to unbound recursion and crash: __sanitizer_cov_trace_pc() -> ftrace_likely_update -> __sanitizer_cov_trace_pc() ... Define DISABLE_BRANCH_PROFILING to disable this tracer. Signed-off-by: Andrey RyabininCc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit 36f05ae8bce904b4c8105363e6227a79d343bda6) Signed-off-by: Andrey Ryabinin --- kernel/kcov.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/kcov.c b/kernel/kcov.c index f963fca..91b00e6 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -1,5 +1,6 @@ #define pr_fmt(fmt) "kcov: " fmt +#define DISABLE_BRANCH_PROFILING #include #include #include -- 2.7.3 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH] ext4: fix mkdir operations with overlayfs
Thanks! You may be interested to search devel@openvz.org archives for: Subject: [PATCH rh7] ext4: ext4_mkdir must set S_IOPS_WRAPPER bit Date: Mon, 25 Jul 2016 14:01:16 -0700 On 10/14/2016 09:47 AM, Vladimir Meshkov wrote: ext4 supports an extended operations like rename2, but inode isn't correctly marked after mkdir. Signed-off-by: Alexey Lyashkov> --- fs/ext4/namei.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 0adc6df..bebe698 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2413,6 +2413,7 @@ retry: inode->i_op = _dir_inode_operations.ops; inode->i_fop = _dir_operations; + inode->i_flags |= S_IOPS_WRAPPER; err = ext4_init_new_dir(handle, dir, inode); if (err) goto out_clear_inode; -- 1.8.3.1 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel