Re: [PATCH v5 1/2] kernel/signal: Signal-based pre-coredump notification

2018-12-06 Thread Oleg Nesterov
On 12/04, Andrew Morton wrote: > > I'm not sure what to make of this patchset, really. Oleg sounds > unhappy and that's always a bad sign. And signals are rather ugly > things. Oleg, can you please expand on your concerns? I don't really know what can I say... Yes the signals are ugly things,

Re: [PATCH v2] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-12-05 Thread Oleg Nesterov
d_uprobe_list from remove_breakpoint(). Do it here. >*/ > + mutex_lock(_uprobe_lock); > delayed_uprobe_remove(uprobe, NULL); > + mutex_unlock(_uprobe_lock); > kfree(uprobe); Acked-by: Oleg Nesterov

Re: [PATCH] Revert "exec: make de_thread() freezable (was: Re: Linux 4.20-rc4)

2018-12-05 Thread Oleg Nesterov
Ingo, et al, Sorry, I am sick and can't participate this discussion right now, On 12/04, Ingo Molnar wrote: > > * Oleg Nesterov wrote: > > > we really need to narrow the (huge) scope of ->cred_guard_mutex in exec > > paths. > > > > my attempt to fix t

Re: [PATCH] Revert "exec: make de_thread() freezable (was: Re: Linux 4.20-rc4)

2018-12-03 Thread Oleg Nesterov
On 12/03, Michal Hocko wrote: > > On Mon 03-12-18 13:31:49, Oleg Nesterov wrote: > > On 12/03, Michal Hocko wrote: > > > > > > Now, I wouldn't mind to revert this because the code is really old and > > > we haven't seen many bug reports about failing suspend

Re: [PATCH] Revert "exec: make de_thread() freezable (was: Re: Linux 4.20-rc4)

2018-12-03 Thread Oleg Nesterov
On 12/03, Michal Hocko wrote: > > Now, I wouldn't mind to revert this because the code is really old and > we haven't seen many bug reports about failing suspend yet. But what is > the actual plan to make this work properly? I don't see a simple solution... But we need to fix exec/de_thread

Re: [PATCH] Revert "exec: make de_thread() freezable (was: Re: Linux 4.20-rc4)

2018-12-03 Thread Oleg Nesterov
On 12/03, Ingo Molnar wrote: > > So there's a new regression in v4.20-rc4, my desktop produces this > lockdep splat: > > [ 1772.588771] WARNING: pkexec/4633 still has locks held! > [ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted > [ 1772.588775]

Re: [PATCH v5 1/2] kernel/signal: Signal-based pre-coredump notification

2018-11-30 Thread Oleg Nesterov
On 11/29, Dave Martin wrote: > > SIGCHLD + wait() is immune to this problem for other child status > notifications (albeit with higher overhead). > > Unless I've missed something fundamental, signals simply aren't a > reliable data transport Yes. But I hope we are not going to implement

Re: [PATCH v4 1/2] ptrace: save the type of syscall-stop in ptrace_message

2018-11-30 Thread Oleg Nesterov
On 11/30, Dmitry V. Levin wrote: > > On Thu, Nov 29, 2018 at 03:47:43PM +0100, Oleg Nesterov wrote: > > > > so that PTRACE_GETEVENTMSG users can easily tell > > > whether this new semantics is supported by the kernel or not. > > > > Yes. And how much t

Re: [PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-11-30 Thread Oleg Nesterov
used to prevent stray child > processes. Reviewed-by: Oleg Nesterov

Re: [PATCH] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-11-30 Thread Oleg Nesterov
On 11/29, Jürg Billeter wrote: > > On Thu, 2018-11-29 at 13:34 +0100, Oleg Nesterov wrote: > > To me it would be more clean to call > > walk_process_tree(kill_descendant_visitor) > > unconditionally in find_new_reaper() right before "if > > (has_child_sub

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-11-30 Thread Oleg Nesterov
On 11/29, Tycho Andersen wrote: > > /* > * These should never be seen by user programs. To return one of ERESTART* > * codes, signal_pending() MUST be set. Note that ptrace can observe these > * at syscall exit tracing, but they will never be left for the debugged user > * process to see. >

Re: [PATCH v4 1/2] ptrace: save the type of syscall-stop in ptrace_message

2018-11-29 Thread Oleg Nesterov
On 11/28, Andy Lutomirski wrote: > > I don't like any of this at all. Can we please choose a sensible API > design and let the API drive the implementation instead of vice versa? I too do not understand your concerns... > ISTM the correct solution is to add some new state to task_struct for >

Re: [PATCH v4 1/2] ptrace: save the type of syscall-stop in ptrace_message

2018-11-29 Thread Oleg Nesterov
On 11/29, Dmitry V. Levin wrote: > > 2. Document these values sure, they should be documented and live in include/uapi/, > chosen to avoid collisions with ptrace_message values > set by other ptrace events this is what I can't understand. But to clarify, I don't really care and won't argue. If

Re: [PATCH] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-11-29 Thread Oleg Nesterov
On 11/28, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > On 11/27, Jürg Billeter wrote: > >> > >> @@ -704,6 +713,9 @@ static void exit_notify(struct task_struct *tsk, int > >> group_dead) > >>struct task_struct *p, *n; > >

Re: [PATCH] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-11-28 Thread Oleg Nesterov
On 11/27, Jürg Billeter wrote: > > @@ -704,6 +713,9 @@ static void exit_notify(struct task_struct *tsk, int > group_dead) > struct task_struct *p, *n; > LIST_HEAD(dead); > > + if (group_dead && tsk->signal->kill_descendants_on_exit) > + walk_process_tree(tsk,

Re: [PATCH v4 2/2] ptrace: add PTRACE_GET_SYSCALL_INFO request

2018-11-28 Thread Oleg Nesterov
On 11/28, Oleg Nesterov wrote: > > On 11/28, Dmitry V. Levin wrote: > > > > +static unsigned long > > +ptrace_get_syscall_info_entry(struct task_struct *child, > > + struct ptrace_syscall_info *info) > > +{ > > +

Re: [PATCH v4 1/2] ptrace: save the type of syscall-stop in ptrace_message

2018-11-28 Thread Oleg Nesterov
On 11/28, Dmitry V. Levin wrote: > > On Wed, Nov 28, 2018 at 02:49:14PM +0100, Oleg Nesterov wrote: > > On 11/28, Dmitry V. Levin wrote: > > > > > > +/* > > > + * These values are stored in task->ptrace_message by > > > tracehook_report_s

Re: [PATCH v4 2/2] ptrace: add PTRACE_GET_SYSCALL_INFO request

2018-11-28 Thread Oleg Nesterov
On 11/28, Dmitry V. Levin wrote: > > +static unsigned long > +ptrace_get_syscall_info_entry(struct task_struct *child, > + struct ptrace_syscall_info *info) > +{ > + struct pt_regs *regs = task_pt_regs(child); > + unsigned long args[ARRAY_SIZE(info->entry.args)];

Re: [PATCH v4 1/2] ptrace: save the type of syscall-stop in ptrace_message

2018-11-28 Thread Oleg Nesterov
On 11/28, Dmitry V. Levin wrote: > > +/* > + * These values are stored in task->ptrace_message by > tracehook_report_syscall_* > + * to describe current syscall-stop. > + * > + * Values for these constants are chosen so that they do not appear > + * in task->ptrace_message by other means. > + */

Re: [RFC PATCH RESEND v3 3/3] ptrace: add PTRACE_EVENT_SECCOMP support to PTRACE_GET_SYSCALL_INFO

2018-11-28 Thread Oleg Nesterov
On 11/28, Dmitry V. Levin wrote: > > What about PTRACE_GETSIGINFO? Can it also be done lockless because > ptrace_check_attach() has already called ptrace_freeze_traced()? Yes, only PTRACE_INTERRUPT needs lock_task_sighand() and it can actually fail because the tracee is not necessarily stopped.

Re: [RFC PATCH RESEND v3 3/3] ptrace: add PTRACE_EVENT_SECCOMP support to PTRACE_GET_SYSCALL_INFO

2018-11-28 Thread Oleg Nesterov
On 11/28, Dmitry V. Levin wrote: > > > Just like ptrace_request(PTRACE_LISTEN) > > does but you can do this lockless (no need to lock_task_sighand()). > > Why this can be done lockless? All other places in that file do > the locking, PTRACE_LISTEN too doesn't need lock_task_sighand() to access

Re: [PATCH] ptrace: take into account saved_sigmask in PTRACE_{GET,SET}SIGMASK

2018-11-27 Thread Oleg Nesterov
On 11/26, Andrei Vagin wrote: > > > IOW, could you please explain how PTRACE_SETSIGMASK should be used, and why > > it doesn't do something like > > > > CRIU uses PTRACE_SETSIGMASK when it injects a parasite code into a > target process. In this case, we have to be sure that when the process > is

Re: [RFC PATCH RESEND v3 3/3] ptrace: add PTRACE_EVENT_SECCOMP support to PTRACE_GET_SYSCALL_INFO

2018-11-27 Thread Oleg Nesterov
On 11/27, Elvira Khabirova wrote: > > On Mon, 26 Nov 2018 15:35:24 +0100 > Oleg Nesterov wrote: > > > On 11/25, Elvira Khabirova wrote: > > > > > > Extend PTRACE_GET_SYSCALL_INFO to support PTRACE_EVENT_SECCOMP stops. > > > The information re

Re: [PATCH RESEND v3 1/3] ptrace: pass type of a syscall-stop in ptrace_message

2018-11-26 Thread Oleg Nesterov
On 11/25, Elvira Khabirova wrote: > > + * These values are stored in task->ptrace_message by > tracehook_report_syscall_* > + * to describe current syscall-stop. > + * > + * Values for these constants are chosen so that they do not appear > + * in task->ptrace_message by other means. > + */ >

Re: [RFC PATCH RESEND v3 3/3] ptrace: add PTRACE_EVENT_SECCOMP support to PTRACE_GET_SYSCALL_INFO

2018-11-26 Thread Oleg Nesterov
On 11/25, Elvira Khabirova wrote: > > Extend PTRACE_GET_SYSCALL_INFO to support PTRACE_EVENT_SECCOMP stops. > The information returned is the same as for syscall-enter-stops. Oh, this is not nice ;) there must be a better option, I hope... Plus Can't ptrace_get_syscall() check

Re: [PATCH] exec: separate MM_ANONPAGES and RLIMIT_STACK accounting

2018-11-26 Thread Oleg Nesterov
Hi Guenter, On 11/23, Guenter Roeck wrote: > > On Mon, Nov 12, 2018 at 05:09:10PM +0100, Oleg Nesterov wrote: > > get_arg_page() checks bprm->rlim_stack.rlim_cur and re-calculates the > > "extra" size for argv/envp pointers every time, this is a bit ugly an

Re: [PATCH] uprobes: Fix handle_swbp() vs. unregister() + register() race once more

2018-11-22 Thread Oleg Nesterov
On 11/22, Oleg Nesterov wrote: > On 11/22, Andrea Parri wrote: > > > > Commit 142b18ddc8143 ("uprobes: Fix handle_swbp() vs unregister() + > > register() race") added the UPROBE_COPY_INSN flag, and corresponding > > smp_wmb() and smp_rmb() memory barriers,

Re: [PATCH] uprobes: Fix handle_swbp() vs. unregister() + register() race once more

2018-11-22 Thread Oleg Nesterov
t; Move the smp_rmb() accordingly. Also amend the comments associated > to the two memory barriers to indicate their actual locations. > > Signed-off-by: Andrea Parri > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Arnaldo Carvalho de Melo > Cc: Alexander Shishkin > Cc: Jiri Olsa &g

Re: [Question] smp_wmb() in prepare_uprobe()

2018-11-22 Thread Oleg Nesterov
On 11/22, Andrea Parri wrote: > > > See 142b18ddc81439acda4bc4231b291e99fe67d507 ("uprobes: Fix handle_swbp() > > vs unregister() + register() race") and the comment above this rmb(). > > Mmh..., at first glance, this suggests me that the above set_bit() and > test_bit() to/from uprobe->flags are

Re: uprobe support mix of ftrace and perf

2018-11-22 Thread Oleg Nesterov
Hi Masami, On 11/20, Masami Hiramatsu wrote: > > Hello Oleg, > > I have a question about the (revert) commit 48212542067a > ("tracing/uprobes: Revert "Support mix of ftrace and perf""). I found > this limitation still exists on uprobe events. Yes... > Would you have any plan or idea to fix this

Re: [Question] smp_wmb() in prepare_uprobe()

2018-11-22 Thread Oleg Nesterov
Hi, On 11/21, Andrea Parri wrote: > > The comment for the smp_wmb() in prepare_uprobe() says: > > "pairs with rmb() in find_active_uprobe()" it seems that this comment was wrong from the very beginning, > but I see no (smp_)rmb() in find_active_uprobe(); I see the smp_rmb() in >

Re: [PATCH 2/2] exec: increase BINPRM_BUF_SIZE to 256

2018-11-22 Thread Oleg Nesterov
On 11/16, Alan Cox wrote: > > On Mon, 12 Nov 2018 17:09:56 +0100 > Oleg Nesterov wrote: > > > Large enterprise clients often times run applications out of networked > > file systems where the IT mandated layout of project volumes can end up > > leading to paths that

Re: [PATCH] ptrace: take into account saved_sigmask in PTRACE_{GET,SET}SIGMASK

2018-11-22 Thread Oleg Nesterov
On 11/19, Andrei Vagin wrote: > > case PTRACE_SETSIGMASK: { > sigset_t new_set; > @@ -962,6 +971,8 @@ int ptrace_request(struct task_struct *child, long > request, > child->blocked = new_set; > spin_unlock_irq(>sighand->siglock); > > +

Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-11-15 Thread Oleg Nesterov
On 11/15, Ravi Bangoria wrote: > > There could be a race between task exit and probe unregister: > > exit_mm() > mmput() > __mmput() uprobe_unregister() > uprobe_clear_state() put_uprobe() > delayed_uprobe_remove() delayed_uprobe_remove() > >

Re: [PATCH v2 3/6] cgroup: cgroup v2 freezer

2018-11-14 Thread Oleg Nesterov
On 11/14, Roman Gushchin wrote: > > > No, this is very wrong. Just suppose the caller is killed right before > > clear_thread_flag(TIF_SIGPENDING). > > So, I had TASK_KILLABLE before, but had some issues with ptrace/gdb. > I'll revisit this option. Yes, ptrace_signal_wake_up() won't wake a

Re: [PATCH v2 3/6] cgroup: cgroup v2 freezer

2018-11-14 Thread Oleg Nesterov
Hi Roman, On 11/13, Roman Gushchin wrote: > > > > +#define TASK_FROZEN 0x1000 > > > +#define TASK_STATE_MAX 0x2000 > > > > Just noticed the new task state... Why? Can't we avoid it? > > We can, but it's nice to show to userspace that tasks are frozen, >

Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-11-14 Thread Oleg Nesterov
On 11/14, Ravi Bangoria wrote: > > syzbot reported a kernel crash with delayed_uprobe_remove(): > https://lkml.org/lkml/2018/11/1/1244 > > Backtrace mentioned in the link points to a race between process > exit and uprobe_unregister(). Fix it by locking delayed_uprobe_lock > before calling

Re: [PATCH 2/2] exec: increase BINPRM_BUF_SIZE to 256

2018-11-14 Thread Oleg Nesterov
On 11/13, Andrew Morton wrote: > > On Tue, 13 Nov 2018 17:55:58 +0100 Oleg Nesterov wrote: > > > > However it would be basically cost-free to increase > > > BINPRM_BUF_SIZE up to the point where sizeof(struct linux_binprm) == > > > PAGE_SIZE? > > >

Re: [PATCH v2] exec: make de_thread() freezable

2018-11-14 Thread Oleg Nesterov
On 11/14, Michal Hocko wrote: > > > I don't understand why it isn't appropriate for exec to block. The > > exec can freeze. When tasks are thawed, the killed sub-thread will die > > and wake de_thread(). The exec will continue to work from resume. > > Because this is fragile. I don't really

Re: [PATCH v2] exec: make de_thread() freezable

2018-11-14 Thread Oleg Nesterov
On 11/13, Michal Hocko wrote: > > > > > > > > > To fix this, make de_thread() freezable. It looks safe and works fine. > > > > > > It's been some time since I have looked into this code so bear with me. > > > One thing is not really clear to me. Why does it help to exclude this > > > particular

Re: [PATCH 2/2] exec: increase BINPRM_BUF_SIZE to 256

2018-11-13 Thread Oleg Nesterov
On 11/12, Andrew Morton wrote: > On Mon, 12 Nov 2018 17:09:56 +0100 Oleg Nesterov wrote: > > > /* sizeof(linux_binprm->buf) */ > > -#define BINPRM_BUF_SIZE 128 > > +#define BINPRM_BUF_SIZE 256 > > > > #endif /* _UAPI_LINUX_BINFMTS_H */ > > It d

Re: [PATCH 1/2] exec: load_script: don't blindly truncate shebang string

2018-11-13 Thread Oleg Nesterov
On 11/13, Michal Hocko wrote: > > A bit cryptic to my taste Ys, because I didn't want to touch the code below. We need to cleanup the whole "parse bprm->buf" code, not only this part. > but it looks correct. I have tried to come up > with something more tasty but I am afraid it would be just

Re: [PATCH v2] exec: make de_thread() freezable

2018-11-13 Thread Oleg Nesterov
On 11/13, Michal Hocko wrote: > > On Mon 12-11-18 12:54:45, Chanho Min wrote: > > Suspend fails due to the exec family of functions blocking the freezer. > > The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for > > all sub-threads to die, and we have the deadlock if one of them

Re: [PATCH v2 3/6] cgroup: cgroup v2 freezer

2018-11-13 Thread Oleg Nesterov
Hi Tejun, On 11/13, Tejun Heo wrote: > > > OK, please forget for now, but perhaps it would be more clean to add > > JOBCTL_TRAP_FREEZE to the JOBCTL_PENDING_MASK check in recalc_sigpending() > > and change get_signal to check JOBCTL_TRAP_MASK | JOBCTL_TRAP_FREEZE; and > > I am not even sure

Re: [PATCH v2 3/6] cgroup: cgroup v2 freezer

2018-11-13 Thread Oleg Nesterov
On 11/12, Roman Gushchin wrote: > > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -83,7 +83,8 @@ struct task_group; > #define TASK_WAKING 0x0200 > #define TASK_NOLOAD 0x0400 > #define TASK_NEW 0x0800 > -#define

Re: [PATCH v2 3/6] cgroup: cgroup v2 freezer

2018-11-13 Thread Oleg Nesterov
On 11/12, Roman Gushchin wrote: > > This patch implements freezer for cgroup v2. However the functionality > is similar, the interface is different to cgroup v1: it follows > cgroup v2 interface principles. Oh, it seems that I actually need to apply this patch to (try to) understand the details

[PATCH 2/2] exec: increase BINPRM_BUF_SIZE to 256

2018-11-12 Thread Oleg Nesterov
still fitting into a 512b slab. Reported-by: Ben Woodard Signed-off-by: Oleg Nesterov --- include/uapi/linux/binfmts.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/binfmts.h b/include/uapi/linux/binfmts.h index 4abad03..689025d 100644 --- a/include/uapi

[PATCH 1/2] exec: load_script: don't blindly truncate shebang string

2018-11-12 Thread Oleg Nesterov
XEC if it can't find '\n' or zero in bprm->buf. Note that '\0' can come from either prepare_binprm()->memset() or from kernel_read(), we do not care. Signed-off-by: Oleg Nesterov --- fs/binfmt_script.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/binfmt_scri

[PATCH] exec: separate MM_ANONPAGES and RLIMIT_STACK accounting

2018-11-12 Thread Oleg Nesterov
rgmin calculated once at the start of __do_execve_file() and change copy_strings to check bprm->p >= bprm->argmin. The patch adds the new helper, prepare_arg_pages() which initializes bprm->argc/envc and bprm->argmin. Signed-off-by: Oleg Nesterov ---

Re: [PATCH v2] exec: make de_thread() freezable

2018-11-12 Thread Oleg Nesterov
goto killed; > } Thanks, looks good to me. Acked-by: Oleg Nesterov

Re: [PATCH] exec: make de_thread() freezable

2018-11-09 Thread Oleg Nesterov
On 11/09, Chanho Min wrote: > > @@ -1083,7 +1084,7 @@ static int de_thread(struct task_struct *tsk) > while (sig->notify_count) { > __set_current_state(TASK_KILLABLE); > spin_unlock_irq(lock); > - schedule(); > + freezable_schedule();

Re: [RFC PATCH] ptrace: add PTRACE_GET_SYSCALL_INFO request

2018-11-08 Thread Oleg Nesterov
On 11/07, Andy Lutomirski wrote: > > > On Nov 7, 2018, at 8:44 AM, Oleg Nesterov wrote: > > > > Not sure I understand you... I do not like "compat" too, but this patch uses > > is_compat/etc and I agree with any naming. > > My point is: returning a valu

Re: [PATCH] freezer: fix freeze timeout on exec

2018-11-08 Thread Oleg Nesterov
On 11/08, Chanho Min wrote: > > Suspend fails due to the exec family of fuctnions blocking the freezer. > This issue has been found that it is mentioned in the ancient mail thread. > The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for all > sub-threads to die, and we have the

Re: [RFC PATCH] ptrace: add PTRACE_GET_SYSCALL_INFO request

2018-11-08 Thread Oleg Nesterov
On 11/07, Elvira Khabirova wrote: > > On Wed, 7 Nov 2018 17:44:44 +0100 > Oleg Nesterov wrote: > > > To me PT_IN_SYSCALL_STOP makes no real sense, but I won't argue. > > > > At least I'd ask to not abuse task->ptrace. ptrace_report_syscall() can > > c

Re: [RFC PATCH] ptrace: add PTRACE_GET_SYSCALL_INFO request

2018-11-07 Thread Oleg Nesterov
On 11/07, Andy Lutomirski wrote: > > > > On Nov 7, 2018, at 3:21 AM, Oleg Nesterov wrote: > > > >> On 11/07, Elvira Khabirova wrote: > >> > >> In short, if a 64-bit task performs a syscall through int 0x80, its tracer > >> has no re

Re: [RFC PATCH] ptrace: add PTRACE_GET_SYSCALL_INFO request

2018-11-07 Thread Oleg Nesterov
On 11/07, Elvira Khabirova wrote: > > In short, if a 64-bit task performs a syscall through int 0x80, its tracer > has no reliable means to find out that the syscall was, in fact, > a compat syscall, and misidentifies it. > * Syscall-enter-stop and syscall-exit-stop look the same for the tracer.

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-11-02 Thread Oleg Nesterov
On 11/01, Tycho Andersen wrote: > > On Thu, Nov 01, 2018 at 03:48:05PM +0100, Oleg Nesterov wrote: > > > > > > But my main concern is that either way wait_for_completion_killable() > > > > allows > > > > to trivially create a process w

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-11-02 Thread Oleg Nesterov
On 11/01, Tycho Andersen wrote: > > On Thu, Nov 01, 2018 at 02:40:02PM +0100, Oleg Nesterov wrote: > > > > Somehow I no longer understand why do you need to take all locks. Isn't > > the first filter's notify_lock enough? IOW, > > > > for (cur = cur

Re: [PATCH RFC kenrel/rcu] Eliminate BUG_ON() for sync.c

2018-11-01 Thread Oleg Nesterov
On 11/01, Paul E. McKenney wrote: > > Any news on exactly which patch constituted the reworking of this > code some time back? Again, I never sent a patch, I simply showed the new code (more than 2 years ago ;), see below. I need to re-read our discussiong, but iirc your and Peter's reviews were

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-11-01 Thread Oleg Nesterov
On 10/30, Tycho Andersen wrote: > > > I am not sure I understand the value of > > signaled/SECCOMP_NOTIF_FLAG_SIGNALED... > > I mean, why it is actually useful? > > > > Sorry if this was already discussed. > > :) no problem, many people have complained about this. This is an > implementation of

Re: [PATCH RFC kenrel/rcu] Eliminate BUG_ON() for sync.c

2018-11-01 Thread Oleg Nesterov
On 10/31, Paul E. McKenney wrote: > > Oy... Right message, wrong commit. > > Does the one below look somewhat more relevant? ;-) Much more relevant ;) feel free to add Acked-by: Oleg Nesterov

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-11-01 Thread Oleg Nesterov
On 10/29, Tycho Andersen wrote: > > +static int seccomp_notify_release(struct inode *inode, struct file *file) > +{ > + struct seccomp_filter *filter = file->private_data; > + struct seccomp_knotif *knotif; > + > + mutex_lock(>notify_lock); > + > + /* > + * If this file is

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-11-01 Thread Oleg Nesterov
On 10/29, Tycho Andersen wrote: > > +static struct file *init_listener(struct seccomp_filter *filter) > +{ > + struct file *ret = ERR_PTR(-EBUSY); > + struct seccomp_filter *cur, *last_locked = NULL; > + int filter_nesting = 0; > + > + for (cur = current->seccomp.filter; cur; cur =

Re: [PATCH RFC kenrel/rcu] Eliminate BUG_ON() for sync.c

2018-10-31 Thread Oleg Nesterov
On 10/30, Paul E. McKenney wrote: > > On Mon, Oct 22, 2018 at 06:14:40PM +0200, Oleg Nesterov wrote: > > > > > > > > Damn. > > > > > > > > This suddenly r

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-31 Thread Oleg Nesterov
On 10/31, Daniel Colascione wrote: > > > Confused... why? kill_ok_by_cred() should fail? > > Not if we don't run it. :-) I thought you were proposing that we do > *all* access checks in open() and let write() succeed unconditionally, Ah, no ;) > Anyway, I sent a v2 patch that I think closes the

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-31 Thread Oleg Nesterov
On 10/31, Daniel Colascione wrote: > > > perhaps it would be simpler to do > > > > my_cred = override_creds(file->f_cred); > > kill_pid(...); > > revert_creds(my_cred); > > Thanks for the suggestion. That looks neat, but it's not quite enough. > The problem is that

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-31 Thread Oleg Nesterov
On 10/30, Kees Cook wrote: > > I'd like to avoid changing the return value of __secure_computing() to > just avoid having to touch all the callers. And I'd prefer not to > change __seccomp_filter() to a bool, since I'd like the return values > to be consistent through the call chain. Sure, please

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-31 Thread Oleg Nesterov
On 10/30, Eric W. Biederman wrote: > > At a bare minimum you need to perform the permission check using the > credentials of the opener of the file.Which means refactoring > kill_pid so that you can perform the permission check for killing the > application during open. perhaps it would be

Re: [PATCH v5] kernel/signal: Signal-based pre-coredump notification

2018-10-30 Thread Oleg Nesterov
On 10/29, Enke Chen wrote: > > Reviewed-by: Oleg Nesterov Hmm. I didn't say this ;) But OK, feel free to keep this tag. I do not like this feauture. But I see no technical problems in this version and I never pretented I understand the user-space needs, so I won't argue. Oleg.

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Oleg Nesterov
On 10/30, Oleg Nesterov wrote: > > On 10/30, Tycho Andersen wrote: > > > > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const > > struct seccomp_data *sd, > > */ > > rmb(); > > > > + if (!sd) { > > +

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Oleg Nesterov
On 10/30, Tycho Andersen wrote: > > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const > struct seccomp_data *sd, >*/ > rmb(); > > + if (!sd) { > + populate_seccomp_data(_local); > + sd = _local; > + } > + To me it would be

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Oleg Nesterov
On 10/29, Tycho Andersen wrote: > > +static long seccomp_notify_recv(struct seccomp_filter *filter, > + void __user *buf) > +{ > + struct seccomp_knotif *knotif = NULL, *cur; > + struct seccomp_notif unotif; > + ssize_t ret; > + > + memset(, 0,

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Oleg Nesterov
On 10/29, Tycho Andersen wrote: > > + /* This is where we wait for a reply from userspace. */ > + err = wait_for_completion_interruptible(); > + mutex_lock(>notify_lock); > + > + /* > + * If the noticiation fd died before we re-acquired the lock, we still > + * give

yama: unsafe usage of ptrace_relation->tracer

2018-10-29 Thread Oleg Nesterov
let me change the subject to avoid the confusion with the already confusing disccussion about task_is_descendant(). On 10/29, Oleg Nesterov wrote: > > I still think we need a single pid_alive() check and I even sent the patch. > Attached again at the end. > > To clarify

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-29 Thread Oleg Nesterov
On 10/26, Kees Cook wrote: > > On Thu, Oct 25, 2018 at 2:01 PM, Oleg Nesterov wrote: > > On 10/25, Oleg Nesterov wrote: > >> perhaps it needs some changes too. I even have a vague feeling that I have > >> already > >> blamed this function some ti

Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-29 Thread Oleg Nesterov
Hi, On 10/26, Enke Chen wrote: > > This is really a good idea given that "parent" is declared as RCU-protected. > Just a bit odd, though, that the "parent" has not been accessed this way in > the code base. It is acccessed when possible, > So just to confirm: the revised code would look like

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-26 Thread Oleg Nesterov
On 10/27, Tetsuo Handa wrote: > > On 2018/10/26 23:39, Oleg Nesterov wrote: > > On 10/26, Tetsuo Handa wrote: > >> Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited > >> when someone tried to attach on p2, p2->real_parent was pointing to

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-26 Thread Oleg Nesterov
On 10/26, Tetsuo Handa wrote: > > On 2018/10/26 22:04, Oleg Nesterov wrote: > >> Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited > >> when p2 tried to attach on p1, p2->real_parent was pointing to already > >> (or about to

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-26 Thread Oleg Nesterov
On 10/26, Tetsuo Handa wrote: > > Since the "child" passed to task_is_descendant() has at least one reference > count taken by find_get_task_by_vpid(), rcu_dereference(walker->real_parent) > in the first iteration > > while (child->pid > 0) { > if (!thread_group_leader(child)) > walker

Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-26 Thread Oleg Nesterov
On 10/25, Enke Chen wrote: > > +static void do_notify_parent_predump(void) > +{ > + struct task_struct *parent; > + int sig; > + > + read_lock(_lock); > + parent = current->parent; > + sig = parent->signal->predump_signal; > + if (sig != 0) > +

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-25 Thread Oleg Nesterov
On 10/25, Oleg Nesterov wrote: > > Because our rcu read-lock critical section extends beyond the return from > synchronize_rcu(), and thus we must have a full memory barrier _between_ > that synchronize_rcu() and our rcu_read_lock(). We must see all memory > updates, > including

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-25 Thread Oleg Nesterov
On 10/25, Tetsuo Handa wrote: > > On 2018/10/25 21:17, Oleg Nesterov wrote: > >>> And yes, task_is_descendant() can hit the dead child, if nothing else it > >>> can > >>> be killed. This can explain the kasan report. > >> > >> The kasa

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-25 Thread Oleg Nesterov
On 10/25, Oleg Nesterov wrote: > > As I said below, please ignore ptracer_exception_found(), another caller for > now, > perhaps it needs some changes too. I even have a vague feeling that I have > already > blamed this function some time ago... Heh, yes, 3 years ago ;) https

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-25 Thread Oleg Nesterov
On 10/25, Tetsuo Handa wrote: > > On 2018/10/25 20:13, Oleg Nesterov wrote: > > > > So again, suppose that "child" is already dead. Its task_struct can't be > > freed, > > but child->real_parent can point to the already freed memory. > &g

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-25 Thread Oleg Nesterov
On 10/25, Kees Cook wrote: > > On Thu, Oct 25, 2018 at 12:13 PM, Oleg Nesterov wrote: > > So again, suppose that "child" is already dead. Its task_struct can't be > > freed, > > but child->real_parent can point to the already freed memory. > > I can'

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-25 Thread Oleg Nesterov
On 10/25, Kees Cook wrote: > > task_is_descendant() is called under rcu_read_lock() in both > ptracer_exception_found() and yama_ptrace_access_check() so I don't > understand how any of the tasks could get freed? This is walking > group_leader and real_parent -- are these not stable under

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-25 Thread Oleg Nesterov
On 10/25, Tetsuo Handa wrote: > > Oleg Nesterov wrote: > > On 10/22, Tetsuo Handa wrote: > > > > And again, I do not know how/if yama ensures that child is > > > > rcu-protected, perhaps > > > > task_is_descendant() needs to check pid

Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-24 Thread Oleg Nesterov
On 10/23, Enke Chen wrote: > > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) > if (retval < 0) > goto fail_creds; > > + /* > + * Send the pre-coredump signal to the parent if requested. > + */ >

Re: [PATCH v2] kernel/signal: Signal-based pre-coredump notification

2018-10-24 Thread Oleg Nesterov
On 10/23, Enke Chen wrote: > > >> + /* > >> + * Send the pre-coredump signal to the parent if requested. > >> + */ > >> + read_lock(_lock); > >> + notify = do_notify_parent_predump(current); > >> + read_unlock(_lock); > >> + if (notify) > >> + cond_resched(); > > > > Hmm. I do

Re: [PATCH v2] kernel/signal: Signal-based pre-coredump notification

2018-10-23 Thread Oleg Nesterov
On 10/22, Enke Chen wrote: > > As the coredump of a process may take time, in certain time-sensitive > applications it is necessary for a parent process (e.g., a process > manager) to be notified of a child's imminent death before the coredump > so that the parent process can act sooner, such as

Re: [PATCH RFC kenrel/rcu] Eliminate BUG_ON() for sync.c

2018-10-22 Thread Oleg Nesterov
On 10/22, Paul E. McKenney wrote: > > > > @@ -125,12 +125,12 @@ void rcu_sync_enter(struct rcu_sync *rsp) > > > rsp->gp_state = GP_PENDING; > > > spin_unlock_irq(>rss_lock); > > > > > > - BUG_ON(need_wait && need_sync); > > > - > > > if (need_sync) { > > >

Re: [PATCH RFC kenrel/rcu] Eliminate BUG_ON() for sync.c

2018-10-22 Thread Oleg Nesterov
On 10/22, Paul E. McKenney wrote: > > The sync.c file has a number of calls to BUG_ON(), which panics the > kernel, which is not a good Agreed. I added these BUG_ON's for documentation when I was prototyping this code, perhaps we can simply remove them. > @@ -125,12 +125,12 @@ void

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-22 Thread Oleg Nesterov
On 10/22, Tetsuo Handa wrote: > > > However, task_is_descendant() looks unnecessarily complicated, it could be > > > > static int task_is_descendant(struct task_struct *parent, > > struct task_struct *child) > > { > > int rc = 0; > >

Re: KASAN: use-after-free Read in task_is_descendant

2018-10-22 Thread Oleg Nesterov
On 10/21, Tetsuo Handa wrote: > > On 2018/10/21 16:10, syzbot wrote: > > BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188 > > [inline] > > BUG: KASAN: use-after-free in task_is_descendant.part.2+0x610/0x670 > > security/yama/yama_lsm.c:295 > > Read of size 8 at addr

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-19 Thread Oleg Nesterov
On 10/18, Andy Lutomirski wrote: > > Oleg, the code in kernel/signal.c: > > preempt_disable(); > read_unlock(_lock); > preempt_enable_no_resched(); > freezable_schedule(); > > looks bogus. I don't get what it's trying to achieve with

Re: [PATCH] kernel/signal: Signal-based pre-coredump notification

2018-10-16 Thread Oleg Nesterov
On 10/15, Enke Chen wrote: > > > I don't understand why we need valid_predump_signal() at all. > > Most of the signals have well-defined semantics, and would not be appropriate > for this purpose. you are going to change the rules anyway. > That is why it is limited to only SIGCHLD, SIGUSR1,

Re: [PATCH] kernel/signal: Signal-based pre-coredump notification

2018-10-15 Thread Oleg Nesterov
On 10/12, Enke Chen wrote: > > For simplicity and consistency, this patch provides an implementation > for signal-based fault notification prior to the coredump of a child > process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can > be used by an application to express its interest

Re: [PATCH] kernel/hung_task.c: disable on suspend

2018-09-17 Thread Oleg Nesterov
On 09/17, Rafael J. Wysocki wrote: > > On Fri, Sep 14, 2018 at 6:21 PM Oleg Nesterov wrote: > > > > > > Since you are adding the notifier anyway, what about designing it to > > > > make > > > > the thread wait on _PREPARE until the notifier kicks i

Re: [RFC][PATCH 2/3] exec: Simplify unshare_files

2018-09-17 Thread Oleg Nesterov
absolutely off-topic question, On 09/16, Eric W. Biederman wrote: > > @@ -747,11 +746,9 @@ void do_coredump(const siginfo_t *siginfo) > } > > /* get us an unshared descriptor table; almost always a no-op */ > - retval = unshare_files(); > + retval = unshare_files(); I fail to

Re: [RFC][PATCH 0/3] exec: Moving unshare_files_struct

2018-09-17 Thread Oleg Nesterov
> Eric W. Biederman (3): > exec: Move unshare_files down to avoid locks being dropped on exec. > exec: Simplify unshare_files > exec: Remove reset_files_struct Reviewed-by: Oleg Nesterov

  1   2   3   4   5   6   7   8   9   10   >