Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
- On Oct 17, 2018, at 11:11 AM, Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: > - On Oct 17, 2018, at 3:19 AM, Srikar Dronamraju sri...@linux.vnet.ibm.com > wrote: > >> Hi Mathieu, >> >>> +static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, >>> + struct cpu_opv_vaddr *vaddr_ptrs, int cpu) >>> +{ >>> + struct mm_struct *mm = current->mm; >>> + int ret; >>> + >>> +retry: >>> + if (cpu != raw_smp_processor_id()) { >>> + ret = push_task_to_cpu(current, cpu); >>> + if (ret) >>> + goto check_online; >>> + } >>> + down_read(>mmap_sem); >>> + ret = vaddr_ptrs_check(vaddr_ptrs); >>> + if (ret) >>> + goto end; >>> + preempt_disable(); >>> + if (cpu != smp_processor_id()) { >>> + preempt_enable(); >>> + up_read(>mmap_sem); >>> + goto retry; >>> + } >> >> If we have a higher priority task/s either pinned to the cpu, dont we end up >> in busy-looping till the task exits/sleeps? > > You're right! > > How about we ditch the thread migration altogether, and simply perform > the cpu_opv operations in a IPI handler ? > > This is possible now that cpu_opv uses a temporary vmap() rather than > try to touch the user-space page through the current thread's page table. > > Thoughts ? Here is the associated implementation on top of this patchset: commit 759c5a8860d867091e168900329f0955e5101989 Author: Mathieu Desnoyers Date: Wed Oct 17 11:32:02 2018 -0400 cpu opv: use ipi diff --git a/kernel/cpu_opv.c b/kernel/cpu_opv.c index db144b71d51a..30405e0cc049 100644 --- a/kernel/cpu_opv.c +++ b/kernel/cpu_opv.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -1039,41 +1040,48 @@ static int vaddr_ptrs_check(struct cpu_opv_vaddr *vaddr_ptrs) return 0; } +struct opv_ipi_args { + struct cpu_op *cpuop; + int cpuopcnt; + int ret; +}; + +static void cpu_opv_ipi(void *info) +{ + struct opv_ipi_args *args = info; + + rseq_preempt(current); + args->ret = __do_cpu_opv(args->cpuop, args->cpuopcnt); +} + static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, struct cpu_opv_vaddr *vaddr_ptrs, int cpu) { struct mm_struct *mm = current->mm; + struct opv_ipi_args args = { + .cpuop = cpuop, + .cpuopcnt = cpuopcnt, + }; int ret; retry: - if (cpu != raw_smp_processor_id()) { - ret = push_task_to_cpu(current, cpu); - if (ret) - goto check_online; - } + if (!cpumask_test_cpu(cpu, >cpus_allowed)) + return -EINVAL; down_read(>mmap_sem); ret = vaddr_ptrs_check(vaddr_ptrs); if (ret) goto end; - preempt_disable(); - if (cpu != smp_processor_id()) { - preempt_enable(); + ret = smp_call_function_single(cpu, cpu_opv_ipi, , 1); + if (ret) { up_read(>mmap_sem); - goto retry; + goto check_online; } - ret = __do_cpu_opv(cpuop, cpuopcnt); - preempt_enable(); + ret = args.ret; end: up_read(>mmap_sem); return ret; check_online: - /* -* push_task_to_cpu() returns -EINVAL if the requested cpu is not part -* of the current thread's cpus_allowed mask. -*/ - if (ret == -EINVAL) - return ret; get_online_cpus(); if (cpu_online(cpu)) { put_online_cpus(); -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
- On Oct 17, 2018, at 11:11 AM, Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: > - On Oct 17, 2018, at 3:19 AM, Srikar Dronamraju sri...@linux.vnet.ibm.com > wrote: > >> Hi Mathieu, >> >>> +static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, >>> + struct cpu_opv_vaddr *vaddr_ptrs, int cpu) >>> +{ >>> + struct mm_struct *mm = current->mm; >>> + int ret; >>> + >>> +retry: >>> + if (cpu != raw_smp_processor_id()) { >>> + ret = push_task_to_cpu(current, cpu); >>> + if (ret) >>> + goto check_online; >>> + } >>> + down_read(>mmap_sem); >>> + ret = vaddr_ptrs_check(vaddr_ptrs); >>> + if (ret) >>> + goto end; >>> + preempt_disable(); >>> + if (cpu != smp_processor_id()) { >>> + preempt_enable(); >>> + up_read(>mmap_sem); >>> + goto retry; >>> + } >> >> If we have a higher priority task/s either pinned to the cpu, dont we end up >> in busy-looping till the task exits/sleeps? > > You're right! > > How about we ditch the thread migration altogether, and simply perform > the cpu_opv operations in a IPI handler ? > > This is possible now that cpu_opv uses a temporary vmap() rather than > try to touch the user-space page through the current thread's page table. > > Thoughts ? Here is the associated implementation on top of this patchset: commit 759c5a8860d867091e168900329f0955e5101989 Author: Mathieu Desnoyers Date: Wed Oct 17 11:32:02 2018 -0400 cpu opv: use ipi diff --git a/kernel/cpu_opv.c b/kernel/cpu_opv.c index db144b71d51a..30405e0cc049 100644 --- a/kernel/cpu_opv.c +++ b/kernel/cpu_opv.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -1039,41 +1040,48 @@ static int vaddr_ptrs_check(struct cpu_opv_vaddr *vaddr_ptrs) return 0; } +struct opv_ipi_args { + struct cpu_op *cpuop; + int cpuopcnt; + int ret; +}; + +static void cpu_opv_ipi(void *info) +{ + struct opv_ipi_args *args = info; + + rseq_preempt(current); + args->ret = __do_cpu_opv(args->cpuop, args->cpuopcnt); +} + static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, struct cpu_opv_vaddr *vaddr_ptrs, int cpu) { struct mm_struct *mm = current->mm; + struct opv_ipi_args args = { + .cpuop = cpuop, + .cpuopcnt = cpuopcnt, + }; int ret; retry: - if (cpu != raw_smp_processor_id()) { - ret = push_task_to_cpu(current, cpu); - if (ret) - goto check_online; - } + if (!cpumask_test_cpu(cpu, >cpus_allowed)) + return -EINVAL; down_read(>mmap_sem); ret = vaddr_ptrs_check(vaddr_ptrs); if (ret) goto end; - preempt_disable(); - if (cpu != smp_processor_id()) { - preempt_enable(); + ret = smp_call_function_single(cpu, cpu_opv_ipi, , 1); + if (ret) { up_read(>mmap_sem); - goto retry; + goto check_online; } - ret = __do_cpu_opv(cpuop, cpuopcnt); - preempt_enable(); + ret = args.ret; end: up_read(>mmap_sem); return ret; check_online: - /* -* push_task_to_cpu() returns -EINVAL if the requested cpu is not part -* of the current thread's cpus_allowed mask. -*/ - if (ret == -EINVAL) - return ret; get_online_cpus(); if (cpu_online(cpu)) { put_online_cpus(); -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
- On Oct 17, 2018, at 3:19 AM, Srikar Dronamraju sri...@linux.vnet.ibm.com wrote: > Hi Mathieu, > >> +static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, >> + struct cpu_opv_vaddr *vaddr_ptrs, int cpu) >> +{ >> +struct mm_struct *mm = current->mm; >> +int ret; >> + >> +retry: >> +if (cpu != raw_smp_processor_id()) { >> +ret = push_task_to_cpu(current, cpu); >> +if (ret) >> +goto check_online; >> +} >> +down_read(>mmap_sem); >> +ret = vaddr_ptrs_check(vaddr_ptrs); >> +if (ret) >> +goto end; >> +preempt_disable(); >> +if (cpu != smp_processor_id()) { >> +preempt_enable(); >> +up_read(>mmap_sem); >> +goto retry; >> +} > > If we have a higher priority task/s either pinned to the cpu, dont we end up > in busy-looping till the task exits/sleeps? You're right! How about we ditch the thread migration altogether, and simply perform the cpu_opv operations in a IPI handler ? This is possible now that cpu_opv uses a temporary vmap() rather than try to touch the user-space page through the current thread's page table. Thoughts ? Thanks, Mathieu > >> +ret = __do_cpu_opv(cpuop, cpuopcnt); >> +preempt_enable(); >> +end: >> +up_read(>mmap_sem); >> +return ret; >> + >> +check_online: >> +/* >> + * push_task_to_cpu() returns -EINVAL if the requested cpu is not part >> + * of the current thread's cpus_allowed mask. >> + */ >> +if (ret == -EINVAL) >> +return ret; >> +get_online_cpus(); >> +if (cpu_online(cpu)) { >> +put_online_cpus(); >> +goto retry; > > + } -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
- On Oct 17, 2018, at 3:19 AM, Srikar Dronamraju sri...@linux.vnet.ibm.com wrote: > Hi Mathieu, > >> +static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, >> + struct cpu_opv_vaddr *vaddr_ptrs, int cpu) >> +{ >> +struct mm_struct *mm = current->mm; >> +int ret; >> + >> +retry: >> +if (cpu != raw_smp_processor_id()) { >> +ret = push_task_to_cpu(current, cpu); >> +if (ret) >> +goto check_online; >> +} >> +down_read(>mmap_sem); >> +ret = vaddr_ptrs_check(vaddr_ptrs); >> +if (ret) >> +goto end; >> +preempt_disable(); >> +if (cpu != smp_processor_id()) { >> +preempt_enable(); >> +up_read(>mmap_sem); >> +goto retry; >> +} > > If we have a higher priority task/s either pinned to the cpu, dont we end up > in busy-looping till the task exits/sleeps? You're right! How about we ditch the thread migration altogether, and simply perform the cpu_opv operations in a IPI handler ? This is possible now that cpu_opv uses a temporary vmap() rather than try to touch the user-space page through the current thread's page table. Thoughts ? Thanks, Mathieu > >> +ret = __do_cpu_opv(cpuop, cpuopcnt); >> +preempt_enable(); >> +end: >> +up_read(>mmap_sem); >> +return ret; >> + >> +check_online: >> +/* >> + * push_task_to_cpu() returns -EINVAL if the requested cpu is not part >> + * of the current thread's cpus_allowed mask. >> + */ >> +if (ret == -EINVAL) >> +return ret; >> +get_online_cpus(); >> +if (cpu_online(cpu)) { >> +put_online_cpus(); >> +goto retry; > > + } -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
Hi Mathieu, > +static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, > + struct cpu_opv_vaddr *vaddr_ptrs, int cpu) > +{ > + struct mm_struct *mm = current->mm; > + int ret; > + > +retry: > + if (cpu != raw_smp_processor_id()) { > + ret = push_task_to_cpu(current, cpu); > + if (ret) > + goto check_online; > + } > + down_read(>mmap_sem); > + ret = vaddr_ptrs_check(vaddr_ptrs); > + if (ret) > + goto end; > + preempt_disable(); > + if (cpu != smp_processor_id()) { > + preempt_enable(); > + up_read(>mmap_sem); > + goto retry; > + } If we have a higher priority task/s either pinned to the cpu, dont we end up in busy-looping till the task exits/sleeps? > + ret = __do_cpu_opv(cpuop, cpuopcnt); > + preempt_enable(); > +end: > + up_read(>mmap_sem); > + return ret; > + > +check_online: > + /* > + * push_task_to_cpu() returns -EINVAL if the requested cpu is not part > + * of the current thread's cpus_allowed mask. > + */ > + if (ret == -EINVAL) > + return ret; > + get_online_cpus(); > + if (cpu_online(cpu)) { > + put_online_cpus(); > + goto retry; > + }
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
Hi Mathieu, > +static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, > + struct cpu_opv_vaddr *vaddr_ptrs, int cpu) > +{ > + struct mm_struct *mm = current->mm; > + int ret; > + > +retry: > + if (cpu != raw_smp_processor_id()) { > + ret = push_task_to_cpu(current, cpu); > + if (ret) > + goto check_online; > + } > + down_read(>mmap_sem); > + ret = vaddr_ptrs_check(vaddr_ptrs); > + if (ret) > + goto end; > + preempt_disable(); > + if (cpu != smp_processor_id()) { > + preempt_enable(); > + up_read(>mmap_sem); > + goto retry; > + } If we have a higher priority task/s either pinned to the cpu, dont we end up in busy-looping till the task exits/sleeps? > + ret = __do_cpu_opv(cpuop, cpuopcnt); > + preempt_enable(); > +end: > + up_read(>mmap_sem); > + return ret; > + > +check_online: > + /* > + * push_task_to_cpu() returns -EINVAL if the requested cpu is not part > + * of the current thread's cpus_allowed mask. > + */ > + if (ret == -EINVAL) > + return ret; > + get_online_cpus(); > + if (cpu_online(cpu)) { > + put_online_cpus(); > + goto retry; > + }
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
Hi Mathieu, On (10/16/18 15:17), Mathieu Desnoyers wrote: > > Therefore, only an internal kernel bug between vm_map_user_ram() and > vm_unmap_user_ram() should trigger the BUG_ON(). No user input is passed > to vm_unmap_user_ram(). > > Now, let's look at vm_map_user_ram(). It calls alloc_vmap_area(), which > returns > a vmap_area. Then if vmap_page_range failed, vm_unmap_user_ram is called on > the > memory that has just been returned by vm_map_user_ram. Again, only an internal > bug between map/unmap can trigger the BUG_ON() in vm_unmap_user_ram. Thanks for spending time on this. Just wanted someone to have extra look at syscall->BUG_ON(). -ss
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
Hi Mathieu, On (10/16/18 15:17), Mathieu Desnoyers wrote: > > Therefore, only an internal kernel bug between vm_map_user_ram() and > vm_unmap_user_ram() should trigger the BUG_ON(). No user input is passed > to vm_unmap_user_ram(). > > Now, let's look at vm_map_user_ram(). It calls alloc_vmap_area(), which > returns > a vmap_area. Then if vmap_page_range failed, vm_unmap_user_ram is called on > the > memory that has just been returned by vm_map_user_ram. Again, only an internal > bug between map/unmap can trigger the BUG_ON() in vm_unmap_user_ram. Thanks for spending time on this. Just wanted someone to have extra look at syscall->BUG_ON(). -ss
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
- On Oct 16, 2018, at 4:10 AM, Sergey Senozhatsky sergey.senozhatsky.w...@gmail.com wrote: > Hi Mathieu, > > On (10/10/18 15:19), Mathieu Desnoyers wrote: > [..] >> +SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt, >> +int, cpu, int, flags) >> +{ > [..] >> +again: >> +ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, _ptrs); >> +if (ret) >> +goto end; >> +ret = do_cpu_opv(cpuopv, cpuopcnt, _ptrs, cpu); >> +if (ret == -EAGAIN) >> +retry = true; >> +end: >> +for (i = 0; i < vaddr_ptrs.nr_vaddr; i++) { >> +struct vaddr *vaddr = _ptrs.addr[i]; >> +int j; >> + >> +vm_unmap_user_ram((void *)vaddr->mem, vaddr->nr_pages); > > A dumb question. > > Both vm_unmap_user_ram() and vm_map_user_ram() can BUG_ON(). > So this is > userspace -> syscall -> cpu_opv() -> vm_unmap_user_ram() -> BUG_ON() > > Any chance someone can exploit it? Hi Sergey, Let's look at vm_unmap_user_ram() and vm_map_user_ram() separately. If we look at the input from vm_unmap_user_ram, it's called with the following parameters by the cpu_opv system call: for (i = 0; i < vaddr_ptrs.nr_vaddr; i++) { struct vaddr *vaddr = _ptrs.addr[i]; int j; vm_unmap_user_ram((void *)vaddr->mem, vaddr->nr_pages); [...] } The vaddr_ptrs array content is filled by the call to cpu_opv_pin_pages above: ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, _ptrs); if (ret) goto end; by passing the array to cpu_op_pin_pages(), which appends a virtual address at the end of the array (on success) and increments nr_vaddr. Those virtual addresses are returned by vm_map_user_ram(), so they are not user-controlled. Therefore, only an internal kernel bug between vm_map_user_ram() and vm_unmap_user_ram() should trigger the BUG_ON(). No user input is passed to vm_unmap_user_ram(). Now, let's look at vm_map_user_ram(). It calls alloc_vmap_area(), which returns a vmap_area. Then if vmap_page_range failed, vm_unmap_user_ram is called on the memory that has just been returned by vm_map_user_ram. Again, only an internal bug between map/unmap can trigger the BUG_ON() in vm_unmap_user_ram. Is there another scenario I missed ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
- On Oct 16, 2018, at 4:10 AM, Sergey Senozhatsky sergey.senozhatsky.w...@gmail.com wrote: > Hi Mathieu, > > On (10/10/18 15:19), Mathieu Desnoyers wrote: > [..] >> +SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt, >> +int, cpu, int, flags) >> +{ > [..] >> +again: >> +ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, _ptrs); >> +if (ret) >> +goto end; >> +ret = do_cpu_opv(cpuopv, cpuopcnt, _ptrs, cpu); >> +if (ret == -EAGAIN) >> +retry = true; >> +end: >> +for (i = 0; i < vaddr_ptrs.nr_vaddr; i++) { >> +struct vaddr *vaddr = _ptrs.addr[i]; >> +int j; >> + >> +vm_unmap_user_ram((void *)vaddr->mem, vaddr->nr_pages); > > A dumb question. > > Both vm_unmap_user_ram() and vm_map_user_ram() can BUG_ON(). > So this is > userspace -> syscall -> cpu_opv() -> vm_unmap_user_ram() -> BUG_ON() > > Any chance someone can exploit it? Hi Sergey, Let's look at vm_unmap_user_ram() and vm_map_user_ram() separately. If we look at the input from vm_unmap_user_ram, it's called with the following parameters by the cpu_opv system call: for (i = 0; i < vaddr_ptrs.nr_vaddr; i++) { struct vaddr *vaddr = _ptrs.addr[i]; int j; vm_unmap_user_ram((void *)vaddr->mem, vaddr->nr_pages); [...] } The vaddr_ptrs array content is filled by the call to cpu_opv_pin_pages above: ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, _ptrs); if (ret) goto end; by passing the array to cpu_op_pin_pages(), which appends a virtual address at the end of the array (on success) and increments nr_vaddr. Those virtual addresses are returned by vm_map_user_ram(), so they are not user-controlled. Therefore, only an internal kernel bug between vm_map_user_ram() and vm_unmap_user_ram() should trigger the BUG_ON(). No user input is passed to vm_unmap_user_ram(). Now, let's look at vm_map_user_ram(). It calls alloc_vmap_area(), which returns a vmap_area. Then if vmap_page_range failed, vm_unmap_user_ram is called on the memory that has just been returned by vm_map_user_ram. Again, only an internal bug between map/unmap can trigger the BUG_ON() in vm_unmap_user_ram. Is there another scenario I missed ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
Hi Mathieu, On (10/10/18 15:19), Mathieu Desnoyers wrote: [..] > +SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt, > + int, cpu, int, flags) > +{ [..] > +again: > + ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, _ptrs); > + if (ret) > + goto end; > + ret = do_cpu_opv(cpuopv, cpuopcnt, _ptrs, cpu); > + if (ret == -EAGAIN) > + retry = true; > +end: > + for (i = 0; i < vaddr_ptrs.nr_vaddr; i++) { > + struct vaddr *vaddr = _ptrs.addr[i]; > + int j; > + > + vm_unmap_user_ram((void *)vaddr->mem, vaddr->nr_pages); A dumb question. Both vm_unmap_user_ram() and vm_map_user_ram() can BUG_ON(). So this is userspace -> syscall -> cpu_opv() -> vm_unmap_user_ram() -> BUG_ON() Any chance someone can exploit it? -ss
Re: [RFC PATCH for 4.21 06/16] cpu_opv: Provide cpu_opv system call (v8)
Hi Mathieu, On (10/10/18 15:19), Mathieu Desnoyers wrote: [..] > +SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt, > + int, cpu, int, flags) > +{ [..] > +again: > + ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, _ptrs); > + if (ret) > + goto end; > + ret = do_cpu_opv(cpuopv, cpuopcnt, _ptrs, cpu); > + if (ret == -EAGAIN) > + retry = true; > +end: > + for (i = 0; i < vaddr_ptrs.nr_vaddr; i++) { > + struct vaddr *vaddr = _ptrs.addr[i]; > + int j; > + > + vm_unmap_user_ram((void *)vaddr->mem, vaddr->nr_pages); A dumb question. Both vm_unmap_user_ram() and vm_map_user_ram() can BUG_ON(). So this is userspace -> syscall -> cpu_opv() -> vm_unmap_user_ram() -> BUG_ON() Any chance someone can exploit it? -ss