[PATCH] vt: change SGR 21 to follow the standards
From: Mike Frysinger <vap...@chromium.org> ECMA-48 [1] (aka ISO 6429) has defined SGR 21 as "doubly underlined" since at least March 1984. The Linux kernel has treated it as SGR 22 "normal intensity" since it was added in Linux-0.96b in June 1992. Before that, it was simply ignored. Other terminal emulators have either ignored it, or treat it as double underline now. xterm for example added support in its 304 release (May 2014) [2] where it was previously ignoring it. Changing this behavior shouldn't be an issue: - It isn't a named capability in ncurses's terminfo database, so no script is using libtinfo/libcurses to look this up, or using tput to query & output the right sequence. - Any script assuming SGR 21 will reset intensity in all terminals already do not work correctly on non-Linux VTs (including running under screen/tmux/etc...). - If someone has written a script that only runs in the Linux VT, and they're using SGR 21 (instead of SGR 22), the output should still be readable. imo it's important to change this as the Linux VT's non-conformance is sometimes used as an argument for other terminal emulators to not implement SGR 21 at all, or do so incorrectly. [1]: https://www.ecma-international.org/publications/standards/Ecma-048.htm [2]: https://github.com/ThomasDickey/xterm-snapshots/commit/2fd29cb98d214cb536bcafbee00bc73b3f1eeb9d Signed-off-by: Mike Frysinger <vap...@chromium.org> --- drivers/tty/vt/vt.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 88b902c525d7..16c0fcd6904a 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -1354,6 +1354,11 @@ static void csi_m(struct vc_data *vc) case 3: vc->vc_italic = 1; break; + case 21: + /* +* No console drivers support double underline, so +* convert it to a single underline. +*/ case 4: vc->vc_underline = 1; break; @@ -1389,7 +1394,6 @@ static void csi_m(struct vc_data *vc) vc->vc_disp_ctrl = 1; vc->vc_toggle_meta = 1; break; - case 21: case 22: vc->vc_intensity = 1; break; -- 2.16.1
[PATCH] vt: change SGR 21 to follow the standards
From: Mike Frysinger ECMA-48 [1] (aka ISO 6429) has defined SGR 21 as "doubly underlined" since at least March 1984. The Linux kernel has treated it as SGR 22 "normal intensity" since it was added in Linux-0.96b in June 1992. Before that, it was simply ignored. Other terminal emulators have either ignored it, or treat it as double underline now. xterm for example added support in its 304 release (May 2014) [2] where it was previously ignoring it. Changing this behavior shouldn't be an issue: - It isn't a named capability in ncurses's terminfo database, so no script is using libtinfo/libcurses to look this up, or using tput to query & output the right sequence. - Any script assuming SGR 21 will reset intensity in all terminals already do not work correctly on non-Linux VTs (including running under screen/tmux/etc...). - If someone has written a script that only runs in the Linux VT, and they're using SGR 21 (instead of SGR 22), the output should still be readable. imo it's important to change this as the Linux VT's non-conformance is sometimes used as an argument for other terminal emulators to not implement SGR 21 at all, or do so incorrectly. [1]: https://www.ecma-international.org/publications/standards/Ecma-048.htm [2]: https://github.com/ThomasDickey/xterm-snapshots/commit/2fd29cb98d214cb536bcafbee00bc73b3f1eeb9d Signed-off-by: Mike Frysinger --- drivers/tty/vt/vt.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 88b902c525d7..16c0fcd6904a 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -1354,6 +1354,11 @@ static void csi_m(struct vc_data *vc) case 3: vc->vc_italic = 1; break; + case 21: + /* +* No console drivers support double underline, so +* convert it to a single underline. +*/ case 4: vc->vc_underline = 1; break; @@ -1389,7 +1394,6 @@ static void csi_m(struct vc_data *vc) vc->vc_disp_ctrl = 1; vc->vc_toggle_meta = 1; break; - case 21: case 22: vc->vc_intensity = 1; break; -- 2.16.1
Re: [PATCH] groups: don't return unmapped gids in getgroups(2)
On Fri, Feb 17, 2017 at 12:53 PM, Aleksa Sarai wrote: > One thing overlooked by commit 9cc46516ddf4 ("userns: Add a knob to > disable setgroups on a per user namespace basis") is that because > setgroups(2) no longer works in user namespaces it doesn't make any > sense to be returning weird group IDs that the process cannot do > anything with. > >>> > >>> > > > >> bool DropPrivileges() > >> { > >> /* ... */ > >> // Verify that the user isn't still in any supplementary groups > > > > But the user *is* still in a supplementary group. Your proposed > > change would break the intent of this code. > > I was about to say that "being in an unmapped supplementary group does > not count as privileges", but decided to test it first and realised that > this is not true? How is this not a blatant security vulnerability? whether they're mapped is irrelevant. if you make modifications to the FS, the permission check counts against the meaning of the ids in the parent namespaces. otherwise your argument is like saying "i mapped my UID to 0 in this new usernamespace, so how is that not a blatant security vuln". if you started out with access to the group, then keeping that access doesn't grant you any new privs you didn't already have. > % touch somefile > % chmod 660 somefile > % sudo chown root:wheel somefile > % unshare -r > % cat somefile > % # no EACCES... you didn't show `id` before the `unshare`. if you're already in the wheel group, what you've shown here is not a bug. -mike
Re: [PATCH] groups: don't return unmapped gids in getgroups(2)
On Fri, Feb 17, 2017 at 12:53 PM, Aleksa Sarai wrote: > One thing overlooked by commit 9cc46516ddf4 ("userns: Add a knob to > disable setgroups on a per user namespace basis") is that because > setgroups(2) no longer works in user namespaces it doesn't make any > sense to be returning weird group IDs that the process cannot do > anything with. > >>> > >>> > > > >> bool DropPrivileges() > >> { > >> /* ... */ > >> // Verify that the user isn't still in any supplementary groups > > > > But the user *is* still in a supplementary group. Your proposed > > change would break the intent of this code. > > I was about to say that "being in an unmapped supplementary group does > not count as privileges", but decided to test it first and realised that > this is not true? How is this not a blatant security vulnerability? whether they're mapped is irrelevant. if you make modifications to the FS, the permission check counts against the meaning of the ids in the parent namespaces. otherwise your argument is like saying "i mapped my UID to 0 in this new usernamespace, so how is that not a blatant security vuln". if you started out with access to the group, then keeping that access doesn't grant you any new privs you didn't already have. > % touch somefile > % chmod 660 somefile > % sudo chown root:wheel somefile > % unshare -r > % cat somefile > % # no EACCES... you didn't show `id` before the `unshare`. if you're already in the wheel group, what you've shown here is not a bug. -mike
Re: seccomp: dump core when using SECCOMP_RET_KILL
thank for the testcase. i'll take a look. when i went through the code visually, i didn't think it was killing, just suspending+resuming for the sake of snapshotting, but i must have misread. -mike On Tue, Jan 24, 2017 at 2:53 PM, Andrei Vagin <ava...@virtuozzo.com> wrote: > Hi, > > One of CRIU tests fails with this patch: > https://github.com/xemul/criu/blob/master/test/zdtm/static/seccomp_filter_tsync.c > > Before this patch only a thread which called a "wrong" syscall is killed. > Now a whole process is killed if one of threads called a "wrong" syscall. > > Before this patch only one thread is killed: > > 512 seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, {len=4, > filter=0x7fffe111fb10} > 484 kill(30, SIG_0) = 0 > 484 write(1, "Wait for zdtm/static/seccomp_filter_tsync(30) to die for > 0.10\n", 66 > 512 <... seccomp resumed> ) = 0 > 512 futex(0x606420, FUTEX_WAKE_PRIVATE, 1 > 484 <... write resumed> ) = 66 > 512 <... futex resumed> ) = 1 > 484 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10} > 512 futex(0x7f9e894a19d0, FUTEX_WAIT, 32, NULL > 513 <... futex resumed> ) = 0 > 513 futex(0x606420, FUTEX_WAKE_PRIVATE, 1) = 0 > 513 ptrace(PTRACE_TRACEME)= ? > 513 +++ killed by SIGSYS +++ > 512 <... futex resumed> ) = 0 > > After this patch a whole process is killed: > > 767 seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, {len=4, > filter=0x7ffdeedbcd20}) = 0 > 767 futex(0x606420, FUTEX_WAKE_PRIVATE, 1) = 1 > 768 <... futex resumed> ) = 0 > 767 futex(0x7fab05b229d0, FUTEX_WAIT, 32, NULL > 768 futex(0x606420, FUTEX_WAKE_PRIVATE, 1) = 0 > 768 ptrace(PTRACE_TRACEME > 767 <... futex resumed>) = ? > 768 <... ptrace resumed>) = ? > 768 +++ killed by SIGSYS (core dumped) +++ > 767 +++ killed by SIGSYS (core dumped) +++ > 766 <... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSYS && > WCOREDUMP(s)}], 0, NULL) = 31 > 766 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=31, si_ > > Steps to reproduce: > $ git clone git://github.com/xemul/criu > $ cd criu/test/zdtm/static/ > $ make seccomp_filter_tsync.out > $ cat seccomp_filter_tsync.out > > On Thu, Jan 19, 2017 at 10:28:57PM -0600, Mike Frysinger wrote: >> From: Mike Frysinger <vap...@chromium.org> >> >> The SECCOMP_RET_KILL mode is documented as immediately killing the >> process as if a SIGSYS had been sent and not caught (similar to a >> SIGKILL). However, a SIGSYS is documented as triggering a coredump >> which does not happen today. >> >> This has the advantage of being able to more easily debug a process >> that fails a seccomp filter. Today, most apps need to recompile and >> change their filter in order to get detailed info out, or manually run >> things through strace, or enable detailed kernel auditing. Now we get >> coredumps that fit into existing system-wide crash reporting setups. >> >> >From a security pov, this shouldn't be a problem. Unhandled signals >> can already be sent externally which trigger a coredump independent of >> the status of the seccomp filter. The act of dumping core itself does >> not cause change in execution of the program. >> >> URL: https://crbug.com/676357 >> Signed-off-by: Mike Frysinger <vap...@chromium.org> >> Acked-by: Jorge Lucangeli Obes <jorg...@chromium.org> >> Acked-by: Kees Cook <keesc...@chromium.org> >> Acked-by: Kees Cook <keesc...@chromium.org> >> --- >> kernel/seccomp.c | 29 + >> 1 file changed, 21 insertions(+), 8 deletions(-) >> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c >> index f7ce79a46050..f8f88ebcb3ba 100644 >> --- a/kernel/seccomp.c >> +++ b/kernel/seccomp.c >> @@ -16,6 +16,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -486,6 +487,17 @@ void put_seccomp_filter(struct task_struct *tsk) >> } >> } >> >> +static void seccomp_init_siginfo(siginfo_t *info, int syscall, int reason) >> +{ >> + memset(info, 0, sizeof(*info)); >> + info->si_signo = SIGSYS; >> + info->si_code = SYS_SECCOMP; >> + info->si_call_addr = (void __user *)KSTK_EIP(current); >> + info->si_errno = reason; >> + info->si_arch = syscall_get_arch(); >> + info->
Re: seccomp: dump core when using SECCOMP_RET_KILL
thank for the testcase. i'll take a look. when i went through the code visually, i didn't think it was killing, just suspending+resuming for the sake of snapshotting, but i must have misread. -mike On Tue, Jan 24, 2017 at 2:53 PM, Andrei Vagin wrote: > Hi, > > One of CRIU tests fails with this patch: > https://github.com/xemul/criu/blob/master/test/zdtm/static/seccomp_filter_tsync.c > > Before this patch only a thread which called a "wrong" syscall is killed. > Now a whole process is killed if one of threads called a "wrong" syscall. > > Before this patch only one thread is killed: > > 512 seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, {len=4, > filter=0x7fffe111fb10} > 484 kill(30, SIG_0) = 0 > 484 write(1, "Wait for zdtm/static/seccomp_filter_tsync(30) to die for > 0.10\n", 66 > 512 <... seccomp resumed> ) = 0 > 512 futex(0x606420, FUTEX_WAKE_PRIVATE, 1 > 484 <... write resumed> ) = 66 > 512 <... futex resumed> ) = 1 > 484 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10} > 512 futex(0x7f9e894a19d0, FUTEX_WAIT, 32, NULL > 513 <... futex resumed> ) = 0 > 513 futex(0x606420, FUTEX_WAKE_PRIVATE, 1) = 0 > 513 ptrace(PTRACE_TRACEME)= ? > 513 +++ killed by SIGSYS +++ > 512 <... futex resumed> ) = 0 > > After this patch a whole process is killed: > > 767 seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, {len=4, > filter=0x7ffdeedbcd20}) = 0 > 767 futex(0x606420, FUTEX_WAKE_PRIVATE, 1) = 1 > 768 <... futex resumed> ) = 0 > 767 futex(0x7fab05b229d0, FUTEX_WAIT, 32, NULL > 768 futex(0x606420, FUTEX_WAKE_PRIVATE, 1) = 0 > 768 ptrace(PTRACE_TRACEME > 767 <... futex resumed>) = ? > 768 <... ptrace resumed>) = ? > 768 +++ killed by SIGSYS (core dumped) +++ > 767 +++ killed by SIGSYS (core dumped) +++ > 766 <... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSYS && > WCOREDUMP(s)}], 0, NULL) = 31 > 766 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=31, si_ > > Steps to reproduce: > $ git clone git://github.com/xemul/criu > $ cd criu/test/zdtm/static/ > $ make seccomp_filter_tsync.out > $ cat seccomp_filter_tsync.out > > On Thu, Jan 19, 2017 at 10:28:57PM -0600, Mike Frysinger wrote: >> From: Mike Frysinger >> >> The SECCOMP_RET_KILL mode is documented as immediately killing the >> process as if a SIGSYS had been sent and not caught (similar to a >> SIGKILL). However, a SIGSYS is documented as triggering a coredump >> which does not happen today. >> >> This has the advantage of being able to more easily debug a process >> that fails a seccomp filter. Today, most apps need to recompile and >> change their filter in order to get detailed info out, or manually run >> things through strace, or enable detailed kernel auditing. Now we get >> coredumps that fit into existing system-wide crash reporting setups. >> >> >From a security pov, this shouldn't be a problem. Unhandled signals >> can already be sent externally which trigger a coredump independent of >> the status of the seccomp filter. The act of dumping core itself does >> not cause change in execution of the program. >> >> URL: https://crbug.com/676357 >> Signed-off-by: Mike Frysinger >> Acked-by: Jorge Lucangeli Obes >> Acked-by: Kees Cook >> Acked-by: Kees Cook >> --- >> kernel/seccomp.c | 29 + >> 1 file changed, 21 insertions(+), 8 deletions(-) >> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c >> index f7ce79a46050..f8f88ebcb3ba 100644 >> --- a/kernel/seccomp.c >> +++ b/kernel/seccomp.c >> @@ -16,6 +16,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -486,6 +487,17 @@ void put_seccomp_filter(struct task_struct *tsk) >> } >> } >> >> +static void seccomp_init_siginfo(siginfo_t *info, int syscall, int reason) >> +{ >> + memset(info, 0, sizeof(*info)); >> + info->si_signo = SIGSYS; >> + info->si_code = SYS_SECCOMP; >> + info->si_call_addr = (void __user *)KSTK_EIP(current); >> + info->si_errno = reason; >> + info->si_arch = syscall_get_arch(); >> + info->si_syscall = syscall; >> +} >> + >> /** >> * seccomp_send_sigsys - signals the task to allow in-process syscall >> emulatio
Re: [PATCH] seccomp: dump core when using SECCOMP_RET_KILL
On Sun, Jan 22, 2017 at 4:10 PM, James Morris wrote: > On Fri, 20 Jan 2017, Kees Cook wrote: > > Yup, I think this is fine. The additional kernel code executed before > > the do_exit() is relatively limited, and is equivalent to leaving > > kill(self, SIGSEGV) exposed in a seccomp filter. Setting an RLIMIT is > > also sufficient to block the core generation, so really paranoid > > environments can still do that. > > > > The forwarded ack stands: > > > > Acked-by: Kees Cook> > > > James, can you add this to your tree? > > Mike, please resend the patch, I don't have it. looks like patchwork grabbed it: https://patchwork.kernel.org/patch/9527359/ has a mbox link which should get you what you need ? -mike
Re: [PATCH] seccomp: dump core when using SECCOMP_RET_KILL
On Sun, Jan 22, 2017 at 4:10 PM, James Morris wrote: > On Fri, 20 Jan 2017, Kees Cook wrote: > > Yup, I think this is fine. The additional kernel code executed before > > the do_exit() is relatively limited, and is equivalent to leaving > > kill(self, SIGSEGV) exposed in a seccomp filter. Setting an RLIMIT is > > also sufficient to block the core generation, so really paranoid > > environments can still do that. > > > > The forwarded ack stands: > > > > Acked-by: Kees Cook > > > > James, can you add this to your tree? > > Mike, please resend the patch, I don't have it. looks like patchwork grabbed it: https://patchwork.kernel.org/patch/9527359/ has a mbox link which should get you what you need ? -mike
[PATCH] seccomp: dump core when using SECCOMP_RET_KILL
From: Mike Frysinger <vap...@chromium.org> The SECCOMP_RET_KILL mode is documented as immediately killing the process as if a SIGSYS had been sent and not caught (similar to a SIGKILL). However, a SIGSYS is documented as triggering a coredump which does not happen today. This has the advantage of being able to more easily debug a process that fails a seccomp filter. Today, most apps need to recompile and change their filter in order to get detailed info out, or manually run things through strace, or enable detailed kernel auditing. Now we get coredumps that fit into existing system-wide crash reporting setups. >From a security pov, this shouldn't be a problem. Unhandled signals can already be sent externally which trigger a coredump independent of the status of the seccomp filter. The act of dumping core itself does not cause change in execution of the program. URL: https://crbug.com/676357 Signed-off-by: Mike Frysinger <vap...@chromium.org> Acked-by: Jorge Lucangeli Obes <jorg...@chromium.org> Acked-by: Kees Cook <keesc...@chromium.org> --- kernel/seccomp.c | 29 + 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index f7ce79a46050..f8f88ebcb3ba 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -486,6 +487,17 @@ void put_seccomp_filter(struct task_struct *tsk) } } +static void seccomp_init_siginfo(siginfo_t *info, int syscall, int reason) +{ + memset(info, 0, sizeof(*info)); + info->si_signo = SIGSYS; + info->si_code = SYS_SECCOMP; + info->si_call_addr = (void __user *)KSTK_EIP(current); + info->si_errno = reason; + info->si_arch = syscall_get_arch(); + info->si_syscall = syscall; +} + /** * seccomp_send_sigsys - signals the task to allow in-process syscall emulation * @syscall: syscall number to send to userland @@ -496,13 +508,7 @@ void put_seccomp_filter(struct task_struct *tsk) static void seccomp_send_sigsys(int syscall, int reason) { struct siginfo info; - memset(, 0, sizeof(info)); - info.si_signo = SIGSYS; - info.si_code = SYS_SECCOMP; - info.si_call_addr = (void __user *)KSTK_EIP(current); - info.si_errno = reason; - info.si_arch = syscall_get_arch(); - info.si_syscall = syscall; + seccomp_init_siginfo(, syscall, reason); force_sig_info(SIGSYS, , current); } #endif /* CONFIG_SECCOMP_FILTER */ @@ -634,10 +640,17 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, return 0; case SECCOMP_RET_KILL: - default: + default: { + siginfo_t info; audit_seccomp(this_syscall, SIGSYS, action); + /* Show the original registers in the dump. */ + syscall_rollback(current, task_pt_regs(current)); + /* Trigger a manual coredump since do_exit skips it. */ + seccomp_init_siginfo(, this_syscall, data); + do_coredump(); do_exit(SIGSYS); } + } unreachable(); -- 2.11.0
[PATCH] seccomp: dump core when using SECCOMP_RET_KILL
From: Mike Frysinger The SECCOMP_RET_KILL mode is documented as immediately killing the process as if a SIGSYS had been sent and not caught (similar to a SIGKILL). However, a SIGSYS is documented as triggering a coredump which does not happen today. This has the advantage of being able to more easily debug a process that fails a seccomp filter. Today, most apps need to recompile and change their filter in order to get detailed info out, or manually run things through strace, or enable detailed kernel auditing. Now we get coredumps that fit into existing system-wide crash reporting setups. >From a security pov, this shouldn't be a problem. Unhandled signals can already be sent externally which trigger a coredump independent of the status of the seccomp filter. The act of dumping core itself does not cause change in execution of the program. URL: https://crbug.com/676357 Signed-off-by: Mike Frysinger Acked-by: Jorge Lucangeli Obes Acked-by: Kees Cook --- kernel/seccomp.c | 29 + 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index f7ce79a46050..f8f88ebcb3ba 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -486,6 +487,17 @@ void put_seccomp_filter(struct task_struct *tsk) } } +static void seccomp_init_siginfo(siginfo_t *info, int syscall, int reason) +{ + memset(info, 0, sizeof(*info)); + info->si_signo = SIGSYS; + info->si_code = SYS_SECCOMP; + info->si_call_addr = (void __user *)KSTK_EIP(current); + info->si_errno = reason; + info->si_arch = syscall_get_arch(); + info->si_syscall = syscall; +} + /** * seccomp_send_sigsys - signals the task to allow in-process syscall emulation * @syscall: syscall number to send to userland @@ -496,13 +508,7 @@ void put_seccomp_filter(struct task_struct *tsk) static void seccomp_send_sigsys(int syscall, int reason) { struct siginfo info; - memset(, 0, sizeof(info)); - info.si_signo = SIGSYS; - info.si_code = SYS_SECCOMP; - info.si_call_addr = (void __user *)KSTK_EIP(current); - info.si_errno = reason; - info.si_arch = syscall_get_arch(); - info.si_syscall = syscall; + seccomp_init_siginfo(, syscall, reason); force_sig_info(SIGSYS, , current); } #endif /* CONFIG_SECCOMP_FILTER */ @@ -634,10 +640,17 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, return 0; case SECCOMP_RET_KILL: - default: + default: { + siginfo_t info; audit_seccomp(this_syscall, SIGSYS, action); + /* Show the original registers in the dump. */ + syscall_rollback(current, task_pt_regs(current)); + /* Trigger a manual coredump since do_exit skips it. */ + seccomp_init_siginfo(, this_syscall, data); + do_coredump(); do_exit(SIGSYS); } + } unreachable(); -- 2.11.0
[PATCH] uapi: mqueue.h: add missing linux/types.h include
From: Mike Frysinger <vap...@chromium.org> Commit 63159f5dcccb3858d88aaef800c4ee0eb4cc8577 changed the types from long to __kernel_long_t, but didn't add a linux/types.h include. Code that tries to include this header directly breaks: /usr/include/linux/mqueue.h:26:2: error: unknown type name '__kernel_long_t' __kernel_long_t mq_flags; /* message queue flags */ This also upsets configure tests for this header: checking linux/mqueue.h usability... no checking linux/mqueue.h presence... yes configure: WARNING: linux/mqueue.h: present but cannot be compiled configure: WARNING: linux/mqueue.h: check for missing prerequisite headers? configure: WARNING: linux/mqueue.h: see the Autoconf documentation configure: WARNING: linux/mqueue.h: section "Present But Cannot Be Compiled" configure: WARNING: linux/mqueue.h: proceeding with the compiler's result checking for linux/mqueue.h... no Signed-off-by: Mike Frysinger <vap...@gentoo.org> --- include/uapi/linux/mqueue.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h index d0a2b8e89813..bbd5116ea739 100644 --- a/include/uapi/linux/mqueue.h +++ b/include/uapi/linux/mqueue.h @@ -18,6 +18,8 @@ #ifndef _LINUX_MQUEUE_H #define _LINUX_MQUEUE_H +#include + #define MQ_PRIO_MAX32768 /* per-uid limit of kernel memory used by mqueue, in bytes */ #define MQ_BYTES_MAX 819200 -- 2.11.0
[PATCH] uapi: mqueue.h: add missing linux/types.h include
From: Mike Frysinger Commit 63159f5dcccb3858d88aaef800c4ee0eb4cc8577 changed the types from long to __kernel_long_t, but didn't add a linux/types.h include. Code that tries to include this header directly breaks: /usr/include/linux/mqueue.h:26:2: error: unknown type name '__kernel_long_t' __kernel_long_t mq_flags; /* message queue flags */ This also upsets configure tests for this header: checking linux/mqueue.h usability... no checking linux/mqueue.h presence... yes configure: WARNING: linux/mqueue.h: present but cannot be compiled configure: WARNING: linux/mqueue.h: check for missing prerequisite headers? configure: WARNING: linux/mqueue.h: see the Autoconf documentation configure: WARNING: linux/mqueue.h: section "Present But Cannot Be Compiled" configure: WARNING: linux/mqueue.h: proceeding with the compiler's result checking for linux/mqueue.h... no Signed-off-by: Mike Frysinger --- include/uapi/linux/mqueue.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h index d0a2b8e89813..bbd5116ea739 100644 --- a/include/uapi/linux/mqueue.h +++ b/include/uapi/linux/mqueue.h @@ -18,6 +18,8 @@ #ifndef _LINUX_MQUEUE_H #define _LINUX_MQUEUE_H +#include + #define MQ_PRIO_MAX32768 /* per-uid limit of kernel memory used by mqueue, in bytes */ #define MQ_BYTES_MAX 819200 -- 2.11.0
[PATCH] timerfd: export defines to userspace
Since userspace is expected to call timerfd syscalls directly with these flags/ioctls, make sure we export them so they don't have to duplicate the values themselves. Acked-by: Thomas Gleixner <t...@linutronix.de> Signed-off-by: Mike Frysinger <vap...@gentoo.org> --- ping -- this was sent about two years ago now include/linux/timerfd.h | 20 +--- include/uapi/linux/Kbuild| 1 + include/uapi/linux/timerfd.h | 36 3 files changed, 38 insertions(+), 19 deletions(-) create mode 100644 include/uapi/linux/timerfd.h diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h index bd36ce431e32..bab0b1ad0613 100644 --- a/include/linux/timerfd.h +++ b/include/linux/timerfd.h @@ -8,23 +8,7 @@ #ifndef _LINUX_TIMERFD_H #define _LINUX_TIMERFD_H -/* For O_CLOEXEC and O_NONBLOCK */ -#include - -/* For _IO helpers */ -#include - -/* - * CAREFUL: Check include/asm-generic/fcntl.h when defining - * new flags, since they might collide with O_* ones. We want - * to re-use O_* flags that couldn't possibly have a meaning - * from eventfd, in order to leave a free define-space for - * shared O_* flags. - */ -#define TFD_TIMER_ABSTIME (1 << 0) -#define TFD_TIMER_CANCEL_ON_SET (1 << 1) -#define TFD_CLOEXEC O_CLOEXEC -#define TFD_NONBLOCK O_NONBLOCK +#include #define TFD_SHARED_FCNTL_FLAGS (TFD_CLOEXEC | TFD_NONBLOCK) /* Flags for timerfd_create. */ @@ -32,6 +16,4 @@ /* Flags for timerfd_settime. */ #define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET) -#define TFD_IOC_SET_TICKS _IOW('T', 0, u64) - #endif /* _LINUX_TIMERFD_H */ diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index cd2be1c8e9fb..9c4d25ed8e1e 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -411,6 +411,7 @@ header-y += telephony.h header-y += termios.h header-y += thermal.h header-y += time.h +header-y += timerfd.h header-y += times.h header-y += timex.h header-y += tiocl.h diff --git a/include/uapi/linux/timerfd.h b/include/uapi/linux/timerfd.h new file mode 100644 index ..6fcfaa8da173 --- /dev/null +++ b/include/uapi/linux/timerfd.h @@ -0,0 +1,36 @@ +/* + * include/linux/timerfd.h + * + * Copyright (C) 2007 Davide Libenzi <davi...@xmailserver.org> + * + */ + +#ifndef _UAPI_LINUX_TIMERFD_H +#define _UAPI_LINUX_TIMERFD_H + +#include + +/* For O_CLOEXEC and O_NONBLOCK */ +#include + +/* For _IO helpers */ +#include + +/* + * CAREFUL: Check include/asm-generic/fcntl.h when defining + * new flags, since they might collide with O_* ones. We want + * to re-use O_* flags that couldn't possibly have a meaning + * from eventfd, in order to leave a free define-space for + * shared O_* flags. + * + * Also make sure to update the masks in include/linux/timerfd.h + * when adding new flags. + */ +#define TFD_TIMER_ABSTIME (1 << 0) +#define TFD_TIMER_CANCEL_ON_SET (1 << 1) +#define TFD_CLOEXEC O_CLOEXEC +#define TFD_NONBLOCK O_NONBLOCK + +#define TFD_IOC_SET_TICKS _IOW('T', 0, __u64) + +#endif /* _UAPI_LINUX_TIMERFD_H */ -- 2.11.0.rc2
[PATCH] timerfd: export defines to userspace
Since userspace is expected to call timerfd syscalls directly with these flags/ioctls, make sure we export them so they don't have to duplicate the values themselves. Acked-by: Thomas Gleixner Signed-off-by: Mike Frysinger --- ping -- this was sent about two years ago now include/linux/timerfd.h | 20 +--- include/uapi/linux/Kbuild| 1 + include/uapi/linux/timerfd.h | 36 3 files changed, 38 insertions(+), 19 deletions(-) create mode 100644 include/uapi/linux/timerfd.h diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h index bd36ce431e32..bab0b1ad0613 100644 --- a/include/linux/timerfd.h +++ b/include/linux/timerfd.h @@ -8,23 +8,7 @@ #ifndef _LINUX_TIMERFD_H #define _LINUX_TIMERFD_H -/* For O_CLOEXEC and O_NONBLOCK */ -#include - -/* For _IO helpers */ -#include - -/* - * CAREFUL: Check include/asm-generic/fcntl.h when defining - * new flags, since they might collide with O_* ones. We want - * to re-use O_* flags that couldn't possibly have a meaning - * from eventfd, in order to leave a free define-space for - * shared O_* flags. - */ -#define TFD_TIMER_ABSTIME (1 << 0) -#define TFD_TIMER_CANCEL_ON_SET (1 << 1) -#define TFD_CLOEXEC O_CLOEXEC -#define TFD_NONBLOCK O_NONBLOCK +#include #define TFD_SHARED_FCNTL_FLAGS (TFD_CLOEXEC | TFD_NONBLOCK) /* Flags for timerfd_create. */ @@ -32,6 +16,4 @@ /* Flags for timerfd_settime. */ #define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET) -#define TFD_IOC_SET_TICKS _IOW('T', 0, u64) - #endif /* _LINUX_TIMERFD_H */ diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index cd2be1c8e9fb..9c4d25ed8e1e 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -411,6 +411,7 @@ header-y += telephony.h header-y += termios.h header-y += thermal.h header-y += time.h +header-y += timerfd.h header-y += times.h header-y += timex.h header-y += tiocl.h diff --git a/include/uapi/linux/timerfd.h b/include/uapi/linux/timerfd.h new file mode 100644 index ..6fcfaa8da173 --- /dev/null +++ b/include/uapi/linux/timerfd.h @@ -0,0 +1,36 @@ +/* + * include/linux/timerfd.h + * + * Copyright (C) 2007 Davide Libenzi + * + */ + +#ifndef _UAPI_LINUX_TIMERFD_H +#define _UAPI_LINUX_TIMERFD_H + +#include + +/* For O_CLOEXEC and O_NONBLOCK */ +#include + +/* For _IO helpers */ +#include + +/* + * CAREFUL: Check include/asm-generic/fcntl.h when defining + * new flags, since they might collide with O_* ones. We want + * to re-use O_* flags that couldn't possibly have a meaning + * from eventfd, in order to leave a free define-space for + * shared O_* flags. + * + * Also make sure to update the masks in include/linux/timerfd.h + * when adding new flags. + */ +#define TFD_TIMER_ABSTIME (1 << 0) +#define TFD_TIMER_CANCEL_ON_SET (1 << 1) +#define TFD_CLOEXEC O_CLOEXEC +#define TFD_NONBLOCK O_NONBLOCK + +#define TFD_IOC_SET_TICKS _IOW('T', 0, __u64) + +#endif /* _UAPI_LINUX_TIMERFD_H */ -- 2.11.0.rc2
Re: [glibc] preadv/pwritev question
On 31 May 2016 17:00, Chris Metcalf wrote: > On 5/31/2016 4:04 PM, Yury Norov wrote: > > In path a63c7fa18a (Add sysdeps/unix/sysv/linux/generic/.) you add > > this: > > +++ b/sysdeps/unix/sysv/linux/generic/wordsize-32/preadv.c > > > > [...] > > > > +static ssize_t > > +do_preadv (int fd, const struct iovec *vector, int count, off_t > > offset) > > +{ > > + assert (sizeof (offset) == 4); > > + return INLINE_SYSCALL (preadv, __ALIGNMENT_COUNT (5, 6), fd, > > + vector, count, __ALIGNMENT_ARG > > + __LONG_LONG_PAIR (offset >> 31, offset)); > > +} > > + > > > > And this is the code that is picked up if I choose wordsize-32 for my > > AARCH64/ILP32. So I have questions. > > > > 1. What is the assert for? We agreed that all new ABIs will be 64-bit > > off_t only. > > > > I fixed it internally like this: > > +#ifndef __OFF_T_MATCHES_OFF64_T > > assert (sizeof (offset) == 4); > > +#endif > > > > There is a bunch of similar assertions in glibc. > > > > 2. This one looks weird: > > __LONG_LONG_PAIR (offset >> 31, offset)) > > Why 31-bit offset? And why you don't mask 2nd argument? > > Later in your patch I see this: > > +static ssize_t > > +do_preadv64 (int fd, const struct iovec *vector, int count, off64_t > > offset) > > > > +{ > > + return INLINE_SYSCALL (preadv, __ALIGNMENT_COUNT (5, 6), fd, > > + vector, count, __ALIGNMENT_ARG > > + __LONG_LONG_PAIR ((off_t) (offset >> 32), > > + (off_t) (offset & 0x))); > > +} > > > > And it looks correct to me. If 1st version is correct as well, I think > > it should be commented. > > I did this work before x32 came out, so I tried to model it more closely on > the existing x86 compat API. I agree that a 64-bit off_t model seems > reasonable; > however, the code does exactly what it does to match x86, namely preadv() > takes > a 32-bit offset, and preadv64() take a 64-bit offset. The assert() in preadv > to force > sizeof to be 4 is exactly why in that routine we use (offset >> 31, offset). > Since > we know offset fits in 32 bits, all we need to do is properly sign-extend it > into > 64 bits in the high register of the pair, which is what (offset >> 31) does - > you end > up with only 0 or -1, thus sign-extending the 32-bit signed off_t. Then in > preadv64() we actually need to break apart the 64-bit offset into a high 32 > bits > and a low 32 bits, which is what (offset >> 32, offset & 0x) does. > > For a 64-bit off_t you will want to not compile preadv.c at all, and instead > make > __libc_preadv() and friends be aliases of __libc_preadv64(). sounds like Adhemerval's pread/pwrite unify work should be extended to the preadv/pwritev funcs. it deals with the ilp32 case and uses the new SYSCALL_LL macro to deal with the ugly shifting/masking. check out these commits: https://sourceware.org/git/?p=glibc.git;a=commit;h=071af4769fcdfe2cd349157b01f27c9571478ace https://sourceware.org/git/?p=glibc.git;a=commit;h=77a4fbd53611720cd6ae845de560df5dd332b28e https://sourceware.org/git/?p=glibc.git;a=commit;h=eeddfa91cbb1a619af135c7a9ac14251ec094b7a -mike signature.asc Description: Digital signature
Re: [glibc] preadv/pwritev question
On 31 May 2016 17:00, Chris Metcalf wrote: > On 5/31/2016 4:04 PM, Yury Norov wrote: > > In path a63c7fa18a (Add sysdeps/unix/sysv/linux/generic/.) you add > > this: > > +++ b/sysdeps/unix/sysv/linux/generic/wordsize-32/preadv.c > > > > [...] > > > > +static ssize_t > > +do_preadv (int fd, const struct iovec *vector, int count, off_t > > offset) > > +{ > > + assert (sizeof (offset) == 4); > > + return INLINE_SYSCALL (preadv, __ALIGNMENT_COUNT (5, 6), fd, > > + vector, count, __ALIGNMENT_ARG > > + __LONG_LONG_PAIR (offset >> 31, offset)); > > +} > > + > > > > And this is the code that is picked up if I choose wordsize-32 for my > > AARCH64/ILP32. So I have questions. > > > > 1. What is the assert for? We agreed that all new ABIs will be 64-bit > > off_t only. > > > > I fixed it internally like this: > > +#ifndef __OFF_T_MATCHES_OFF64_T > > assert (sizeof (offset) == 4); > > +#endif > > > > There is a bunch of similar assertions in glibc. > > > > 2. This one looks weird: > > __LONG_LONG_PAIR (offset >> 31, offset)) > > Why 31-bit offset? And why you don't mask 2nd argument? > > Later in your patch I see this: > > +static ssize_t > > +do_preadv64 (int fd, const struct iovec *vector, int count, off64_t > > offset) > > > > +{ > > + return INLINE_SYSCALL (preadv, __ALIGNMENT_COUNT (5, 6), fd, > > + vector, count, __ALIGNMENT_ARG > > + __LONG_LONG_PAIR ((off_t) (offset >> 32), > > + (off_t) (offset & 0x))); > > +} > > > > And it looks correct to me. If 1st version is correct as well, I think > > it should be commented. > > I did this work before x32 came out, so I tried to model it more closely on > the existing x86 compat API. I agree that a 64-bit off_t model seems > reasonable; > however, the code does exactly what it does to match x86, namely preadv() > takes > a 32-bit offset, and preadv64() take a 64-bit offset. The assert() in preadv > to force > sizeof to be 4 is exactly why in that routine we use (offset >> 31, offset). > Since > we know offset fits in 32 bits, all we need to do is properly sign-extend it > into > 64 bits in the high register of the pair, which is what (offset >> 31) does - > you end > up with only 0 or -1, thus sign-extending the 32-bit signed off_t. Then in > preadv64() we actually need to break apart the 64-bit offset into a high 32 > bits > and a low 32 bits, which is what (offset >> 32, offset & 0x) does. > > For a 64-bit off_t you will want to not compile preadv.c at all, and instead > make > __libc_preadv() and friends be aliases of __libc_preadv64(). sounds like Adhemerval's pread/pwrite unify work should be extended to the preadv/pwritev funcs. it deals with the ilp32 case and uses the new SYSCALL_LL macro to deal with the ugly shifting/masking. check out these commits: https://sourceware.org/git/?p=glibc.git;a=commit;h=071af4769fcdfe2cd349157b01f27c9571478ace https://sourceware.org/git/?p=glibc.git;a=commit;h=77a4fbd53611720cd6ae845de560df5dd332b28e https://sourceware.org/git/?p=glibc.git;a=commit;h=eeddfa91cbb1a619af135c7a9ac14251ec094b7a -mike signature.asc Description: Digital signature
Re: [PATCH] parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls
On 27 Apr 2016 04:56, Dmitry V. Levin wrote: > Do not load one entry beyond the end of the syscall table when the > syscall number of a traced process equals to __NR_Linux_syscalls. > Similar bug with regular processes was fixed by commit 3bb457af4fa8 > ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls"). > > This bug was found by strace test suite. > > Cc: sta...@vger.kernel.org > Signed-off-by: Dmitry V. Levin> --- > arch/parisc/kernel/syscall.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S > index c976ebf..57b4836 100644 > --- a/arch/parisc/kernel/syscall.S > +++ b/arch/parisc/kernel/syscall.S > @@ -344,7 +344,7 @@ tracesys_next: > #endif > > cmpib,COND(=),n -1,%r20,tracesys_exit /* seccomp may have returned -1 */ > - comiclr,>>= __NR_Linux_syscalls, %r20, %r0 > + comiclr,>> __NR_Linux_syscalls, %r20, %r0 > b,n .Ltracesys_nosys > > LDREGX %r20(%r19), %r19 i've deployd your fix to hake, so feel free to give the tests another run to try and crash the box :). -mike signature.asc Description: Digital signature
Re: [PATCH] parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls
On 27 Apr 2016 04:56, Dmitry V. Levin wrote: > Do not load one entry beyond the end of the syscall table when the > syscall number of a traced process equals to __NR_Linux_syscalls. > Similar bug with regular processes was fixed by commit 3bb457af4fa8 > ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls"). > > This bug was found by strace test suite. > > Cc: sta...@vger.kernel.org > Signed-off-by: Dmitry V. Levin > --- > arch/parisc/kernel/syscall.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S > index c976ebf..57b4836 100644 > --- a/arch/parisc/kernel/syscall.S > +++ b/arch/parisc/kernel/syscall.S > @@ -344,7 +344,7 @@ tracesys_next: > #endif > > cmpib,COND(=),n -1,%r20,tracesys_exit /* seccomp may have returned -1 */ > - comiclr,>>= __NR_Linux_syscalls, %r20, %r0 > + comiclr,>> __NR_Linux_syscalls, %r20, %r0 > b,n .Ltracesys_nosys > > LDREGX %r20(%r19), %r19 i've deployd your fix to hake, so feel free to give the tests another run to try and crash the box :). -mike signature.asc Description: Digital signature
[PATCH] uapi: mqueue.h: add missing linux/types.h include
From: Mike Frysinger Commit 63159f5dcccb3858d88aaef800c4ee0eb4cc8577 changed the types from long to __kernel_long_t, but didn't add a linux/types.h include. Code that tries to include this header directly breaks: /usr/include/linux/mqueue.h:26:2: error: unknown type name '__kernel_long_t' __kernel_long_t mq_flags; /* message queue flags */ This also upsets configure tests for this header: checking linux/mqueue.h usability... no checking linux/mqueue.h presence... yes configure: WARNING: linux/mqueue.h: present but cannot be compiled configure: WARNING: linux/mqueue.h: check for missing prerequisite headers? configure: WARNING: linux/mqueue.h: see the Autoconf documentation configure: WARNING: linux/mqueue.h: section "Present But Cannot Be Compiled" configure: WARNING: linux/mqueue.h: proceeding with the compiler's result checking for linux/mqueue.h... no Signed-off-by: Mike Frysinger --- include/uapi/linux/mqueue.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h index d0a2b8e..bbd5116 100644 --- a/include/uapi/linux/mqueue.h +++ b/include/uapi/linux/mqueue.h @@ -18,6 +18,8 @@ #ifndef _LINUX_MQUEUE_H #define _LINUX_MQUEUE_H +#include + #define MQ_PRIO_MAX32768 /* per-uid limit of kernel memory used by mqueue, in bytes */ #define MQ_BYTES_MAX 819200 -- 2.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] uapi: mqueue.h: add missing linux/types.h include
From: Mike Frysinger <vap...@chromium.org> Commit 63159f5dcccb3858d88aaef800c4ee0eb4cc8577 changed the types from long to __kernel_long_t, but didn't add a linux/types.h include. Code that tries to include this header directly breaks: /usr/include/linux/mqueue.h:26:2: error: unknown type name '__kernel_long_t' __kernel_long_t mq_flags; /* message queue flags */ This also upsets configure tests for this header: checking linux/mqueue.h usability... no checking linux/mqueue.h presence... yes configure: WARNING: linux/mqueue.h: present but cannot be compiled configure: WARNING: linux/mqueue.h: check for missing prerequisite headers? configure: WARNING: linux/mqueue.h: see the Autoconf documentation configure: WARNING: linux/mqueue.h: section "Present But Cannot Be Compiled" configure: WARNING: linux/mqueue.h: proceeding with the compiler's result checking for linux/mqueue.h... no Signed-off-by: Mike Frysinger <vap...@gentoo.org> --- include/uapi/linux/mqueue.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h index d0a2b8e..bbd5116 100644 --- a/include/uapi/linux/mqueue.h +++ b/include/uapi/linux/mqueue.h @@ -18,6 +18,8 @@ #ifndef _LINUX_MQUEUE_H #define _LINUX_MQUEUE_H +#include + #define MQ_PRIO_MAX32768 /* per-uid limit of kernel memory used by mqueue, in bytes */ #define MQ_BYTES_MAX 819200 -- 2.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] userns/capability: Add user namespace capability
On 18 Oct 2015 22:13, Tobias Markus wrote: > On 17.10.2015 22:17, Richard Weinberger wrote: > > On Sat, Oct 17, 2015 at 5:58 PM, Tobias Markus wrote: > >> One question remains though: Does this break userspace executables that > >> expect being able to create user namespaces without priviledge? Since > >> creating user namespaces without CAP_SYS_ADMIN was not possible before > >> Linux 3.8, programs should already expect a potential EPERM upon calling > >> clone. Since creating a user namespace without CAP_SYS_USER_NS would > >> also cause EPERM, we should be on the safe side. > > > > In case of doubt, yes it will break existing software. > > Hiding user namespaces behind CAP_SYS_USER_NS will not magically > > make them secure. > > The goal is not to make user namespaces secure, but to limit access to > them somewhat in order to reduce the potential attack surface. the irony is that disallowing non-privileged processes access to userns means processes cannot jail themselves and thus make themselves more secure. i've been adding userns to various projects purely to get access to things like mount, net, pid, sysv, and ipc namespaces. putting this behind a cap also breaks the Chromium sandbox -- they were able to drop set*id on the sandbox binary and utilize userns instead. https://chromium.googlesource.com/chromium/src/+/master/docs/linux_sandboxing.md https://code.google.com/p/chromium/issues/detail?id=312380 -mike signature.asc Description: Digital signature
Re: [PATCH] userns/capability: Add user namespace capability
On 18 Oct 2015 22:13, Tobias Markus wrote: > On 17.10.2015 22:17, Richard Weinberger wrote: > > On Sat, Oct 17, 2015 at 5:58 PM, Tobias Markuswrote: > >> One question remains though: Does this break userspace executables that > >> expect being able to create user namespaces without priviledge? Since > >> creating user namespaces without CAP_SYS_ADMIN was not possible before > >> Linux 3.8, programs should already expect a potential EPERM upon calling > >> clone. Since creating a user namespace without CAP_SYS_USER_NS would > >> also cause EPERM, we should be on the safe side. > > > > In case of doubt, yes it will break existing software. > > Hiding user namespaces behind CAP_SYS_USER_NS will not magically > > make them secure. > > The goal is not to make user namespaces secure, but to limit access to > them somewhat in order to reduce the potential attack surface. the irony is that disallowing non-privileged processes access to userns means processes cannot jail themselves and thus make themselves more secure. i've been adding userns to various projects purely to get access to things like mount, net, pid, sysv, and ipc namespaces. putting this behind a cap also breaks the Chromium sandbox -- they were able to drop set*id on the sandbox binary and utilize userns instead. https://chromium.googlesource.com/chromium/src/+/master/docs/linux_sandboxing.md https://code.google.com/p/chromium/issues/detail?id=312380 -mike signature.asc Description: Digital signature
Re: handling of supplemental groups with userns
On 22 Sep 2015 17:52, Mike Frysinger wrote: > On 22 Sep 2015 14:40, Eric W. Biederman wrote: > > Mike Frysinger writes: > > > in the mean time, a "quick" fix might be to change new_idmap_permitted > > > to walk all the extents, and if all the ranges are set to 1, check the > > > supplemental groups in addition to the current egid ? > > > > That allows dropping groups that you could not drop normally and so we > > can't allow it, by default. > > if setgroups is set to deny, then it's not possible for me to drop any > groups, and therefore allowing me to map supplemental groups wouldn't be > a problem right ? so this does open up a permission people do not have today by themselves. since there's no requirement that the current gid be in the supplemental groups, they could drop a single group by changing to a supp one. for example, if it looked like: getuid() = 1000 getgid() = 1000 getgroups() = {100, 250} then if 1000/100/250 were mapped in, they could do: setgid(100) getgid() = 100 getgroups() = {100, 250} and thus group 1000 has been dropped. the reason this doesn't happen today is that writes to gid_map is permitted only if the writer has CAP_SETGID in the parent userns and the gid has been mapped there. normal users don't start out with that cap, and any setuid helper has to make sure to drop caps & setuid/setgid before execing the next layer (i.e. do the standard stuff). so the only way to permit this behavior is if the knobs were extended and we had a parallel USERNS_SETGID_ALLOWED. or we updated setgid to check the bit USERNS_SETGROUPS_ALLOWED since it seems in spirit to cover that. -mike signature.asc Description: Digital signature
Re: handling of supplemental groups with userns
On 22 Sep 2015 17:52, Mike Frysinger wrote: > On 22 Sep 2015 14:40, Eric W. Biederman wrote: > > Mike Frysinger writes: > > > in the mean time, a "quick" fix might be to change new_idmap_permitted > > > to walk all the extents, and if all the ranges are set to 1, check the > > > supplemental groups in addition to the current egid ? > > > > That allows dropping groups that you could not drop normally and so we > > can't allow it, by default. > > if setgroups is set to deny, then it's not possible for me to drop any > groups, and therefore allowing me to map supplemental groups wouldn't be > a problem right ? so this does open up a permission people do not have today by themselves. since there's no requirement that the current gid be in the supplemental groups, they could drop a single group by changing to a supp one. for example, if it looked like: getuid() = 1000 getgid() = 1000 getgroups() = {100, 250} then if 1000/100/250 were mapped in, they could do: setgid(100) getgid() = 100 getgroups() = {100, 250} and thus group 1000 has been dropped. the reason this doesn't happen today is that writes to gid_map is permitted only if the writer has CAP_SETGID in the parent userns and the gid has been mapped there. normal users don't start out with that cap, and any setuid helper has to make sure to drop caps & setuid/setgid before execing the next layer (i.e. do the standard stuff). so the only way to permit this behavior is if the knobs were extended and we had a parallel USERNS_SETGID_ALLOWED. or we updated setgid to check the bit USERNS_SETGROUPS_ALLOWED since it seems in spirit to cover that. -mike signature.asc Description: Digital signature
Re: handling of supplemental groups with userns
On 22 Sep 2015 14:40, Eric W. Biederman wrote: > Mike Frysinger writes: > > is it possible to map in supplemental groups in a userns when the user > > lacks setgid/etc... capabilities in the parent ns ? it doesn't seem > > like it's currently possible, but is there a reason to not enable it ? > > In your unprivileged use scenario, you won't be able to drop > your supplementary groups so why do you need them mapped? > > > basically i have a build tool that i want to isolate a bit, but it > > requires access to some of my supplemental groups. if i map just > > my effective uid/gid, the build will fail when it tries to use the > > chown/chgrp commands (gets back EINVAL). > > Yes. That really isn't valid as you are dropping groups. Peculiarly > enough dropping groups can be a security issue as in some permission > checks not being a member of a group can give you enhanced access to > files and directories. i don't want to drop groups ... i want the exact opposite actually :). ideally, `id` would have the same output before/after. instead, i get 65534 for all the supplemental groups. these commands work before i create a new userns and i want them to keep working afterwards: chgrp 100 foo chgrp 250 foo instead, only the first works (since that's my effective gid) and the second fails (since i'm in that via a supplemental group). > So to do something like what you want, you need a setuid helper (something > like newuidmap or newgidmap) to verify that what you are doing is ok > by local policy. i know i can get it ahead of time if i have the caps apriori, but that's not what i want to require. if i had those, then i would generally be able to simply create the namespaces directly and not bother with userns in the first place :). > > my scenario boils down to: > > - normal unprivileged user (uid=8282) > > - member of multiple groups (gid=100, getgroups={100,16,250,...}) > > - create a new userns (to get access to other ns like mount/pid) > >but still have access to existing groups where i'm root > > - use various features that require caps (new pidns/mntns/etc...) > > - create another userns and map back non-root users/groups > > i.e. i switch from 8282 to 0, do what i need, then switch back to 8282. > > [snip] > > > in the mean time, a "quick" fix might be to change new_idmap_permitted > > to walk all the extents, and if all the ranges are set to 1, check the > > supplemental groups in addition to the current egid ? > > That allows dropping groups that you could not drop normally and so we > can't allow it, by default. if setgroups is set to deny, then it's not possible for me to drop any groups, and therefore allowing me to map supplemental groups wouldn't be a problem right ? -mike signature.asc Description: Digital signature
handling of supplemental groups with userns
is it possible to map in supplemental groups in a userns when the user lacks setgid/etc... capabilities in the parent ns ? it doesn't seem like it's currently possible, but is there a reason to not enable it ? basically i have a build tool that i want to isolate a bit, but it requires access to some of my supplemental groups. if i map just my effective uid/gid, the build will fail when it tries to use the chown/chgrp commands (gets back EINVAL). my scenario boils down to: - normal unprivileged user (uid=8282) - member of multiple groups (gid=100, getgroups={100,16,250,...}) - create a new userns (to get access to other ns like mount/pid) but still have access to existing groups where i'm root - use various features that require caps (new pidns/mntns/etc...) - create another userns and map back non-root users/groups i.e. i switch from 8282 to 0, do what i need, then switch back to 8282. i've attached a simple test program to show the issue. it can map the current uid/gid fine, but fails to do so with a supplemental group. my reading of kernel/user_namespace.c shows that it's not possible: (1) if gid_map has any entries, map_write bails early with EPERM (2) if gid_map is empty, then writing a single entry (e.g. "0 100 1") works, but then a 2nd write runs into (1) (3) if gid_map is empty, then writing multiple entries (e.g. "0 100 1\n250 250 1\n") fails in new_idmap_permitted -- the first check is skipped (since new_map->nr_extents is 2), and the user does not have caps in the parent ns i'm aware UID_GID_MAP_MAX_EXTENTS is low, so it's not even possible if i had the caps to map all the existing supplemental groups, which is why i only want to map the few critical groups. but assuming this use case is one we want to support, maybe it makes sense to add a knob to map all of the user's supplemental groups ? in the mean time, a "quick" fix might be to change new_idmap_permitted to walk all the extents, and if all the ranges are set to 1, check the supplemental groups in addition to the current egid ? -mike /* To compile: * gcc test.c * To run: * ./a.out [1|2|3|4|5] * ./a.out -1 "string to write to gid_map" */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include int main(int argc, char *argv[]) { int ret; FILE *fp; int uid = getuid(); int gid = getgid(); int nuid = 0; int ngid = 0; int suppgid; gid_t *groups; size_t i, gsize; int group_mode; switch (argc) { case 0: case 1: group_mode = 1; case 2: group_mode = atoi(argv[1]); case 3: group_mode = -1; } /* Find one supplemental group */ gsize = getgroups(0, NULL); assert(gsize != 0); groups = malloc(sizeof(*groups) * gsize); ret = getgroups(gsize, groups); assert(ret == gsize); for (i = 0; i < gsize; ++i) if (groups[i] != gid) { suppgid = groups[i]; break; } if (i == gsize) errx(1, "could not find a supplemental group to test"); free(groups); /* Create new userns */ assert(unshare(CLONE_NEWUSER) == 0); /* Map the current uid */ fp = fopen("/proc/self/uid_map", "we"); assert(fp); setbuf(fp, NULL); fprintf(fp, "%i %i 1", nuid, uid); fclose(fp); /* Disable setgroups() */ fp = fopen("/proc/self/setgroups", "we"); assert(fp); setbuf(fp, NULL); fputs("deny", fp); fclose(fp); /* Map the various gids */ fp = fopen("/proc/self/gid_map", "we"); assert(fp); setbuf(fp, NULL); switch (group_mode) { case 1: /* This works */ fprintf(fp, "%i %i 1\n", ngid, gid); break; case 2: /* This fails */ fprintf(fp, "%i %i 1\n", ngid, gid); fprintf(fp, "%i %i 1\n", suppgid, suppgid); break; case 3: /* This fails */ fprintf(fp, "%i %i 1\n%i %i 1\n", ngid, gid, suppgid, suppgid); break; case 4: /* This fails */ fprintf(fp, "%i %i 1\n", suppgid, suppgid); break; case 5: /* This fails */ fprintf(fp, "0 0 1\n"); break; case -1: fprintf(fp, argv[2]); break; } fclose(fp); /* Validate */ printf("uid:%i gid:%i\n", getuid(), getgid()); gsize = getgroups(0, NULL); assert(gsize != 0); groups = malloc(sizeof(*groups) * gsize); ret = getgroups(gsize, groups); assert(ret == gsize); printf("groups: "); for (i = 0; i < gsize; ++i) printf("%i ", groups[i]); printf("\n"); free(groups); } signature.asc Description: Digital signature
handling of supplemental groups with userns
is it possible to map in supplemental groups in a userns when the user lacks setgid/etc... capabilities in the parent ns ? it doesn't seem like it's currently possible, but is there a reason to not enable it ? basically i have a build tool that i want to isolate a bit, but it requires access to some of my supplemental groups. if i map just my effective uid/gid, the build will fail when it tries to use the chown/chgrp commands (gets back EINVAL). my scenario boils down to: - normal unprivileged user (uid=8282) - member of multiple groups (gid=100, getgroups={100,16,250,...}) - create a new userns (to get access to other ns like mount/pid) but still have access to existing groups where i'm root - use various features that require caps (new pidns/mntns/etc...) - create another userns and map back non-root users/groups i.e. i switch from 8282 to 0, do what i need, then switch back to 8282. i've attached a simple test program to show the issue. it can map the current uid/gid fine, but fails to do so with a supplemental group. my reading of kernel/user_namespace.c shows that it's not possible: (1) if gid_map has any entries, map_write bails early with EPERM (2) if gid_map is empty, then writing a single entry (e.g. "0 100 1") works, but then a 2nd write runs into (1) (3) if gid_map is empty, then writing multiple entries (e.g. "0 100 1\n250 250 1\n") fails in new_idmap_permitted -- the first check is skipped (since new_map->nr_extents is 2), and the user does not have caps in the parent ns i'm aware UID_GID_MAP_MAX_EXTENTS is low, so it's not even possible if i had the caps to map all the existing supplemental groups, which is why i only want to map the few critical groups. but assuming this use case is one we want to support, maybe it makes sense to add a knob to map all of the user's supplemental groups ? in the mean time, a "quick" fix might be to change new_idmap_permitted to walk all the extents, and if all the ranges are set to 1, check the supplemental groups in addition to the current egid ? -mike /* To compile: * gcc test.c * To run: * ./a.out [1|2|3|4|5] * ./a.out -1 "string to write to gid_map" */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include int main(int argc, char *argv[]) { int ret; FILE *fp; int uid = getuid(); int gid = getgid(); int nuid = 0; int ngid = 0; int suppgid; gid_t *groups; size_t i, gsize; int group_mode; switch (argc) { case 0: case 1: group_mode = 1; case 2: group_mode = atoi(argv[1]); case 3: group_mode = -1; } /* Find one supplemental group */ gsize = getgroups(0, NULL); assert(gsize != 0); groups = malloc(sizeof(*groups) * gsize); ret = getgroups(gsize, groups); assert(ret == gsize); for (i = 0; i < gsize; ++i) if (groups[i] != gid) { suppgid = groups[i]; break; } if (i == gsize) errx(1, "could not find a supplemental group to test"); free(groups); /* Create new userns */ assert(unshare(CLONE_NEWUSER) == 0); /* Map the current uid */ fp = fopen("/proc/self/uid_map", "we"); assert(fp); setbuf(fp, NULL); fprintf(fp, "%i %i 1", nuid, uid); fclose(fp); /* Disable setgroups() */ fp = fopen("/proc/self/setgroups", "we"); assert(fp); setbuf(fp, NULL); fputs("deny", fp); fclose(fp); /* Map the various gids */ fp = fopen("/proc/self/gid_map", "we"); assert(fp); setbuf(fp, NULL); switch (group_mode) { case 1: /* This works */ fprintf(fp, "%i %i 1\n", ngid, gid); break; case 2: /* This fails */ fprintf(fp, "%i %i 1\n", ngid, gid); fprintf(fp, "%i %i 1\n", suppgid, suppgid); break; case 3: /* This fails */ fprintf(fp, "%i %i 1\n%i %i 1\n", ngid, gid, suppgid, suppgid); break; case 4: /* This fails */ fprintf(fp, "%i %i 1\n", suppgid, suppgid); break; case 5: /* This fails */ fprintf(fp, "0 0 1\n"); break; case -1: fprintf(fp, argv[2]); break; } fclose(fp); /* Validate */ printf("uid:%i gid:%i\n", getuid(), getgid()); gsize = getgroups(0, NULL); assert(gsize != 0); groups = malloc(sizeof(*groups) * gsize); ret = getgroups(gsize, groups); assert(ret == gsize); printf("groups: "); for (i = 0; i < gsize; ++i) printf("%i ", groups[i]); printf("\n"); free(groups); } signature.asc Description: Digital signature
Re: handling of supplemental groups with userns
On 22 Sep 2015 14:40, Eric W. Biederman wrote: > Mike Frysinger writes: > > is it possible to map in supplemental groups in a userns when the user > > lacks setgid/etc... capabilities in the parent ns ? it doesn't seem > > like it's currently possible, but is there a reason to not enable it ? > > In your unprivileged use scenario, you won't be able to drop > your supplementary groups so why do you need them mapped? > > > basically i have a build tool that i want to isolate a bit, but it > > requires access to some of my supplemental groups. if i map just > > my effective uid/gid, the build will fail when it tries to use the > > chown/chgrp commands (gets back EINVAL). > > Yes. That really isn't valid as you are dropping groups. Peculiarly > enough dropping groups can be a security issue as in some permission > checks not being a member of a group can give you enhanced access to > files and directories. i don't want to drop groups ... i want the exact opposite actually :). ideally, `id` would have the same output before/after. instead, i get 65534 for all the supplemental groups. these commands work before i create a new userns and i want them to keep working afterwards: chgrp 100 foo chgrp 250 foo instead, only the first works (since that's my effective gid) and the second fails (since i'm in that via a supplemental group). > So to do something like what you want, you need a setuid helper (something > like newuidmap or newgidmap) to verify that what you are doing is ok > by local policy. i know i can get it ahead of time if i have the caps apriori, but that's not what i want to require. if i had those, then i would generally be able to simply create the namespaces directly and not bother with userns in the first place :). > > my scenario boils down to: > > - normal unprivileged user (uid=8282) > > - member of multiple groups (gid=100, getgroups={100,16,250,...}) > > - create a new userns (to get access to other ns like mount/pid) > >but still have access to existing groups where i'm root > > - use various features that require caps (new pidns/mntns/etc...) > > - create another userns and map back non-root users/groups > > i.e. i switch from 8282 to 0, do what i need, then switch back to 8282. > > [snip] > > > in the mean time, a "quick" fix might be to change new_idmap_permitted > > to walk all the extents, and if all the ranges are set to 1, check the > > supplemental groups in addition to the current egid ? > > That allows dropping groups that you could not drop normally and so we > can't allow it, by default. if setgroups is set to deny, then it's not possible for me to drop any groups, and therefore allowing me to map supplemental groups wouldn't be a problem right ? -mike signature.asc Description: Digital signature
[PATCH] elf-em.h: move EM_MICROBLAZE to the common header
The linux/audit.h header uses EM_MICROBLAZE in order to define AUDIT_ARCH_MICROBLAZE, but it's only available in the microblaze asm headers. Move it to the common elf-em.h header so that the define can be used on non-microblaze systems. Otherwise we get build errors that EM_MICROBLAZE isn't defined when we try to use the AUDIT_ARCH_MICROBLAZE symbol. Signed-off-by: Mike Frysinger --- arch/microblaze/include/uapi/asm/elf.h | 3 ++- include/uapi/linux/elf-em.h| 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/microblaze/include/uapi/asm/elf.h b/arch/microblaze/include/uapi/asm/elf.h index be1731d..e9bcdb6 100644 --- a/arch/microblaze/include/uapi/asm/elf.h +++ b/arch/microblaze/include/uapi/asm/elf.h @@ -11,12 +11,13 @@ #ifndef _UAPI_ASM_MICROBLAZE_ELF_H #define _UAPI_ASM_MICROBLAZE_ELF_H +#include + /* * Note there is no "official" ELF designation for Microblaze. * I've snaffled the value from the microblaze binutils source code * /binutils/microblaze/include/elf/microblaze.h */ -#define EM_MICROBLAZE 189 #define EM_MICROBLAZE_OLD 0xbaab #define ELF_ARCH EM_MICROBLAZE diff --git a/include/uapi/linux/elf-em.h b/include/uapi/linux/elf-em.h index b088296..8c90a79 100644 --- a/include/uapi/linux/elf-em.h +++ b/include/uapi/linux/elf-em.h @@ -38,6 +38,7 @@ #define EM_ALTERA_NIOS2113 /* Altera Nios II soft-core processor */ #define EM_TI_C6000140 /* TI C6X DSPs */ #define EM_AARCH64 183 /* ARM 64 bit */ +#define EM_MICROBLAZE 189 /* Xilinx MicroBlaze */ #define EM_FRV 0x5441 /* Fujitsu FR-V */ #define EM_AVR32 0x18ad /* Atmel AVR32 */ -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] elf-em.h: move EM_MICROBLAZE to the common header
The linux/audit.h header uses EM_MICROBLAZE in order to define AUDIT_ARCH_MICROBLAZE, but it's only available in the microblaze asm headers. Move it to the common elf-em.h header so that the define can be used on non-microblaze systems. Otherwise we get build errors that EM_MICROBLAZE isn't defined when we try to use the AUDIT_ARCH_MICROBLAZE symbol. Signed-off-by: Mike Frysinger vap...@gentoo.org --- arch/microblaze/include/uapi/asm/elf.h | 3 ++- include/uapi/linux/elf-em.h| 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/microblaze/include/uapi/asm/elf.h b/arch/microblaze/include/uapi/asm/elf.h index be1731d..e9bcdb6 100644 --- a/arch/microblaze/include/uapi/asm/elf.h +++ b/arch/microblaze/include/uapi/asm/elf.h @@ -11,12 +11,13 @@ #ifndef _UAPI_ASM_MICROBLAZE_ELF_H #define _UAPI_ASM_MICROBLAZE_ELF_H +#include linux/elf-em.h + /* * Note there is no official ELF designation for Microblaze. * I've snaffled the value from the microblaze binutils source code * /binutils/microblaze/include/elf/microblaze.h */ -#define EM_MICROBLAZE 189 #define EM_MICROBLAZE_OLD 0xbaab #define ELF_ARCH EM_MICROBLAZE diff --git a/include/uapi/linux/elf-em.h b/include/uapi/linux/elf-em.h index b088296..8c90a79 100644 --- a/include/uapi/linux/elf-em.h +++ b/include/uapi/linux/elf-em.h @@ -38,6 +38,7 @@ #define EM_ALTERA_NIOS2113 /* Altera Nios II soft-core processor */ #define EM_TI_C6000140 /* TI C6X DSPs */ #define EM_AARCH64 183 /* ARM 64 bit */ +#define EM_MICROBLAZE 189 /* Xilinx MicroBlaze */ #define EM_FRV 0x5441 /* Fujitsu FR-V */ #define EM_AVR32 0x18ad /* Atmel AVR32 */ -- 2.4.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/3] vt: add cursor blink interval escape sequence
On 26 Mar 2015 13:54, Scot Doyle wrote: > Add an escape sequence to specify the current console's cursor blink > interval. The interval is specified as a number of milliseconds until > the next cursor display state toggle, from 50 to 65535. /proc/loadavg > did not show a difference with a one msec interval, but the lower > bound is set to 50 msecs since slower hardware wasn't tested. if they want to be crazy, why not let them ? it's not like we generally prevent the user from destroying their machine. i.e. just require the value to be > 0 (unless you want to let 0 disable things). -mike signature.asc Description: Digital signature
Re: [PATCH v2 1/3] vt: add cursor blink interval escape sequence
On 26 Mar 2015 13:54, Scot Doyle wrote: Add an escape sequence to specify the current console's cursor blink interval. The interval is specified as a number of milliseconds until the next cursor display state toggle, from 50 to 65535. /proc/loadavg did not show a difference with a one msec interval, but the lower bound is set to 50 msecs since slower hardware wasn't tested. if they want to be crazy, why not let them ? it's not like we generally prevent the user from destroying their machine. i.e. just require the value to be 0 (unless you want to let 0 disable things). -mike signature.asc Description: Digital signature
[PATCH] netlink: drop (int) cast on length arg in NLMSG_OK
The NLMSG_OK macro compares three things: - the len arg from the user - a size_t: sizeof(struct nlmsghdr) - an int: sizeof(struct nlmsghdr) casted - an u32: the nlmsghdr->nlmsg_len member When building with -Wsign-compare, this macro triggers a signed compare warning. This is because it compares len to an int, and then compares it to a u32. If len is signed, we get a warning due to the last test. If len is unsigned, we get a warning due to the first test. Like in strace: socketutils.c:145:8: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Lets drop the int cast on the first sizeof. This way, once the user casts len to an unsigned value, everything shakes out correctly. Signed-off-by: Mike Frysinger --- include/uapi/linux/netlink.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index 1a85940..3eb4e24 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -84,7 +84,7 @@ struct nlmsghdr { #define NLMSG_DATA(nlh) ((void*)(((char*)nlh) + NLMSG_LENGTH(0))) #define NLMSG_NEXT(nlh,len) ((len) -= NLMSG_ALIGN((nlh)->nlmsg_len), \ (struct nlmsghdr*)(((char*)(nlh)) + NLMSG_ALIGN((nlh)->nlmsg_len))) -#define NLMSG_OK(nlh,len) ((len) >= (int)sizeof(struct nlmsghdr) && \ +#define NLMSG_OK(nlh,len) ((len) >= sizeof(struct nlmsghdr) && \ (nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \ (nlh)->nlmsg_len <= (len)) #define NLMSG_PAYLOAD(nlh,len) ((nlh)->nlmsg_len - NLMSG_SPACE((len))) -- 2.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] netlink: drop (int) cast on length arg in NLMSG_OK
The NLMSG_OK macro compares three things: - the len arg from the user - a size_t: sizeof(struct nlmsghdr) - an int: sizeof(struct nlmsghdr) casted - an u32: the nlmsghdr-nlmsg_len member When building with -Wsign-compare, this macro triggers a signed compare warning. This is because it compares len to an int, and then compares it to a u32. If len is signed, we get a warning due to the last test. If len is unsigned, we get a warning due to the first test. Like in strace: socketutils.c:145:8: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Lets drop the int cast on the first sizeof. This way, once the user casts len to an unsigned value, everything shakes out correctly. Signed-off-by: Mike Frysinger vap...@gentoo.org --- include/uapi/linux/netlink.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index 1a85940..3eb4e24 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -84,7 +84,7 @@ struct nlmsghdr { #define NLMSG_DATA(nlh) ((void*)(((char*)nlh) + NLMSG_LENGTH(0))) #define NLMSG_NEXT(nlh,len) ((len) -= NLMSG_ALIGN((nlh)-nlmsg_len), \ (struct nlmsghdr*)(((char*)(nlh)) + NLMSG_ALIGN((nlh)-nlmsg_len))) -#define NLMSG_OK(nlh,len) ((len) = (int)sizeof(struct nlmsghdr) \ +#define NLMSG_OK(nlh,len) ((len) = sizeof(struct nlmsghdr) \ (nlh)-nlmsg_len = sizeof(struct nlmsghdr) \ (nlh)-nlmsg_len = (len)) #define NLMSG_PAYLOAD(nlh,len) ((nlh)-nlmsg_len - NLMSG_SPACE((len))) -- 2.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MAINTAINERS: fix file encoding
This file is largely UTF-8 except for this one entry which is ISO-8859-1. By mixing the encodings, grep thinks the file is binary. Signed-off-by: Mike Frysinger --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index eaf9996..7ee86dd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -725,7 +725,7 @@ F: staging/iio/trigger/iio-trig-bfin-timer.c ANDROID DRIVERS M: Greg Kroah-Hartman -M: Arve Hj�nnev�g +M: Arve Hjønnevåg M: Riley Andrews T: git git://git.kernel.org/pub/scm/linux/kernel/gregkh/staging.git L: de...@driverdev.osuosl.org -- 2.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MAINTAINERS: fix file encoding
This file is largely UTF-8 except for this one entry which is ISO-8859-1. By mixing the encodings, grep thinks the file is binary. Signed-off-by: Mike Frysinger vap...@gentoo.org --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index eaf9996..7ee86dd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -725,7 +725,7 @@ F: staging/iio/trigger/iio-trig-bfin-timer.c ANDROID DRIVERS M: Greg Kroah-Hartman gre...@linuxfoundation.org -M: Arve Hj�nnev�g a...@android.com +M: Arve Hjønnevåg a...@android.com M: Riley Andrews riandr...@android.com T: git git://git.kernel.org/pub/scm/linux/kernel/gregkh/staging.git L: de...@driverdev.osuosl.org -- 2.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug#772807: binfmt-support: unable to close /proc/sys/fs/binfmt_misc/register: Invalid argument
On 12 Dec 2014 06:01, Al Viro wrote: > On Fri, Dec 12, 2014 at 02:51:55PM +1030, Arthur Marsh wrote: > > 6b899c4e9a049dfca759d990bd53b14f81c3626c is the first bad commit > > commit 6b899c4e9a049dfca759d990bd53b14f81c3626c > > Author: Mike Frysinger > > Date: Wed Dec 10 15:52:08 2014 -0800 > > > > binfmt_misc: add comments & debug logs > > > > When trying to develop a custom format handler, the errors returned all > > effectively get bucketed as EINVAL with no kernel messages. The other > > errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a > > bad handler is rejected, the developer has to walk the dense code and > > try to guess where it went wrong. Needing to dive into kernel code is > > itself a fairly high barrier for a lot of people. > > > > To improve this situation, let's deploy extensive pr_debug markers at > > logical parse points, and add comments to the dense parsing logic. It > > let's you see exactly where the parsing aborts, the string the kernel > > received (useful when dealing with shell code), how it translated the > > buffers to binary data, and how it will apply the mask at runtime. > > ... and looking at it shows the following garbage: > p[-1] = '\0'; > - if (!e->magic[0]) > + if (p == e->magic) > > That code makes no sense whatsoever - if p *was* equal to e->magic, the > assignment before it would be obviously broken. And a few lines later we > have another chunk just like that. > > Moreover, if you look at further context, you'll see that it's > e->magic = p; > p = scanarg(p, del); > if (!p) > sod off > p[-1] = '\0'; > if (p == e->magic) > sod off > Now, that condition could be true only if scanarg(p, del) would return p, > right? Let's take a look at scanarg(): > static char *scanarg(char *s, char del) > { > char c; > > while ((c = *s++) != del) { > if (c == '\\' && *s == 'x') { > s++; > if (!isxdigit(*s++)) > return NULL; > if (!isxdigit(*s++)) > return NULL; > } > } > return s; > } > > See that s++ in the loop condition? There's no way in hell it would *not* > increment s. If we return non-NULL, we know that c was equal to del *and* > c is equal to previous_value_of_s[0], i.e. s[-1]. IOW, return value points > to the byte following the first instance of delimeter starting at the > argument. > > And p[-1] = '\0' replaces delimiter with NUL. IOW, the checks used to be > correct. And got buggered. the checks are not correct. magic & mask are binary fields hence checking for a NUL byte to indicate string parsing failed makes no sense. your patch restores the bug i attempted to fix -- if i wanted to ignore the first byte of the file by setting the mask or magic to 0, then binfmt_misc will wrongly reject it. the current code does reject some bad inputs -- namely, when the magic or mask is not specified. i was counting on the scanarg not incrementing the pointer in that case, but as you pointed out, that doesn't work since the func always increments the pointer. i'll see if i can handle both cases. > Subject: unfuck fs/binfmt_misc.c > > Undo some of the "prettifying" braindamage. commit 7d65cf10e3d7747033b83fa18c5f3d2a498f66bc has merged at this point, but the attribution to e6084d4 is wrong. of course coding style changes & functional changes shouldn't be done which is why i didn't do it. the change you're referring to above has nothing to do e6084d4 as it was added before that in 6b899c4e9a049dfca759d990bd53b14f81c3626c (where i added extended debug output). arguably those few non-debug related lines shouldn't be in that commit, but it's a long cry from style changes. -mike signature.asc Description: Digital signature
[PATCH] timerfd: export defines to userspace
Since userspace is expected to call timerfd syscalls directly with these flags/ioctls, make sure we export them so they don't have to duplicate the values themselves. Signed-off-by: Mike Frysinger --- include/linux/timerfd.h | 20 +--- include/uapi/linux/Kbuild| 1 + include/uapi/linux/timerfd.h | 36 3 files changed, 38 insertions(+), 19 deletions(-) create mode 100644 include/uapi/linux/timerfd.h diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h index bd36ce4..bab0b1a 100644 --- a/include/linux/timerfd.h +++ b/include/linux/timerfd.h @@ -8,23 +8,7 @@ #ifndef _LINUX_TIMERFD_H #define _LINUX_TIMERFD_H -/* For O_CLOEXEC and O_NONBLOCK */ -#include - -/* For _IO helpers */ -#include - -/* - * CAREFUL: Check include/asm-generic/fcntl.h when defining - * new flags, since they might collide with O_* ones. We want - * to re-use O_* flags that couldn't possibly have a meaning - * from eventfd, in order to leave a free define-space for - * shared O_* flags. - */ -#define TFD_TIMER_ABSTIME (1 << 0) -#define TFD_TIMER_CANCEL_ON_SET (1 << 1) -#define TFD_CLOEXEC O_CLOEXEC -#define TFD_NONBLOCK O_NONBLOCK +#include #define TFD_SHARED_FCNTL_FLAGS (TFD_CLOEXEC | TFD_NONBLOCK) /* Flags for timerfd_create. */ @@ -32,6 +16,4 @@ /* Flags for timerfd_settime. */ #define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET) -#define TFD_IOC_SET_TICKS _IOW('T', 0, u64) - #endif /* _LINUX_TIMERFD_H */ diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index 8523f9b..62ac5e7 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -384,6 +384,7 @@ header-y += tcp_metrics.h header-y += telephony.h header-y += termios.h header-y += time.h +header-y += timerfd.h header-y += times.h header-y += timex.h header-y += tiocl.h diff --git a/include/uapi/linux/timerfd.h b/include/uapi/linux/timerfd.h new file mode 100644 index 000..6fcfaa8 --- /dev/null +++ b/include/uapi/linux/timerfd.h @@ -0,0 +1,36 @@ +/* + * include/linux/timerfd.h + * + * Copyright (C) 2007 Davide Libenzi + * + */ + +#ifndef _UAPI_LINUX_TIMERFD_H +#define _UAPI_LINUX_TIMERFD_H + +#include + +/* For O_CLOEXEC and O_NONBLOCK */ +#include + +/* For _IO helpers */ +#include + +/* + * CAREFUL: Check include/asm-generic/fcntl.h when defining + * new flags, since they might collide with O_* ones. We want + * to re-use O_* flags that couldn't possibly have a meaning + * from eventfd, in order to leave a free define-space for + * shared O_* flags. + * + * Also make sure to update the masks in include/linux/timerfd.h + * when adding new flags. + */ +#define TFD_TIMER_ABSTIME (1 << 0) +#define TFD_TIMER_CANCEL_ON_SET (1 << 1) +#define TFD_CLOEXEC O_CLOEXEC +#define TFD_NONBLOCK O_NONBLOCK + +#define TFD_IOC_SET_TICKS _IOW('T', 0, __u64) + +#endif /* _UAPI_LINUX_TIMERFD_H */ -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] timerfd: export defines to userspace
Since userspace is expected to call timerfd syscalls directly with these flags/ioctls, make sure we export them so they don't have to duplicate the values themselves. Signed-off-by: Mike Frysinger vap...@gentoo.org --- include/linux/timerfd.h | 20 +--- include/uapi/linux/Kbuild| 1 + include/uapi/linux/timerfd.h | 36 3 files changed, 38 insertions(+), 19 deletions(-) create mode 100644 include/uapi/linux/timerfd.h diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h index bd36ce4..bab0b1a 100644 --- a/include/linux/timerfd.h +++ b/include/linux/timerfd.h @@ -8,23 +8,7 @@ #ifndef _LINUX_TIMERFD_H #define _LINUX_TIMERFD_H -/* For O_CLOEXEC and O_NONBLOCK */ -#include linux/fcntl.h - -/* For _IO helpers */ -#include linux/ioctl.h - -/* - * CAREFUL: Check include/asm-generic/fcntl.h when defining - * new flags, since they might collide with O_* ones. We want - * to re-use O_* flags that couldn't possibly have a meaning - * from eventfd, in order to leave a free define-space for - * shared O_* flags. - */ -#define TFD_TIMER_ABSTIME (1 0) -#define TFD_TIMER_CANCEL_ON_SET (1 1) -#define TFD_CLOEXEC O_CLOEXEC -#define TFD_NONBLOCK O_NONBLOCK +#include uapi/linux/timerfd.h #define TFD_SHARED_FCNTL_FLAGS (TFD_CLOEXEC | TFD_NONBLOCK) /* Flags for timerfd_create. */ @@ -32,6 +16,4 @@ /* Flags for timerfd_settime. */ #define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET) -#define TFD_IOC_SET_TICKS _IOW('T', 0, u64) - #endif /* _LINUX_TIMERFD_H */ diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index 8523f9b..62ac5e7 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -384,6 +384,7 @@ header-y += tcp_metrics.h header-y += telephony.h header-y += termios.h header-y += time.h +header-y += timerfd.h header-y += times.h header-y += timex.h header-y += tiocl.h diff --git a/include/uapi/linux/timerfd.h b/include/uapi/linux/timerfd.h new file mode 100644 index 000..6fcfaa8 --- /dev/null +++ b/include/uapi/linux/timerfd.h @@ -0,0 +1,36 @@ +/* + * include/linux/timerfd.h + * + * Copyright (C) 2007 Davide Libenzi davi...@xmailserver.org + * + */ + +#ifndef _UAPI_LINUX_TIMERFD_H +#define _UAPI_LINUX_TIMERFD_H + +#include linux/types.h + +/* For O_CLOEXEC and O_NONBLOCK */ +#include linux/fcntl.h + +/* For _IO helpers */ +#include linux/ioctl.h + +/* + * CAREFUL: Check include/asm-generic/fcntl.h when defining + * new flags, since they might collide with O_* ones. We want + * to re-use O_* flags that couldn't possibly have a meaning + * from eventfd, in order to leave a free define-space for + * shared O_* flags. + * + * Also make sure to update the masks in include/linux/timerfd.h + * when adding new flags. + */ +#define TFD_TIMER_ABSTIME (1 0) +#define TFD_TIMER_CANCEL_ON_SET (1 1) +#define TFD_CLOEXEC O_CLOEXEC +#define TFD_NONBLOCK O_NONBLOCK + +#define TFD_IOC_SET_TICKS _IOW('T', 0, __u64) + +#endif /* _UAPI_LINUX_TIMERFD_H */ -- 2.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug#772807: binfmt-support: unable to close /proc/sys/fs/binfmt_misc/register: Invalid argument
On 12 Dec 2014 06:01, Al Viro wrote: On Fri, Dec 12, 2014 at 02:51:55PM +1030, Arthur Marsh wrote: 6b899c4e9a049dfca759d990bd53b14f81c3626c is the first bad commit commit 6b899c4e9a049dfca759d990bd53b14f81c3626c Author: Mike Frysinger vap...@gentoo.org Date: Wed Dec 10 15:52:08 2014 -0800 binfmt_misc: add comments debug logs When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. ... and looking at it shows the following garbage: p[-1] = '\0'; - if (!e-magic[0]) + if (p == e-magic) That code makes no sense whatsoever - if p *was* equal to e-magic, the assignment before it would be obviously broken. And a few lines later we have another chunk just like that. Moreover, if you look at further context, you'll see that it's e-magic = p; p = scanarg(p, del); if (!p) sod off p[-1] = '\0'; if (p == e-magic) sod off Now, that condition could be true only if scanarg(p, del) would return p, right? Let's take a look at scanarg(): static char *scanarg(char *s, char del) { char c; while ((c = *s++) != del) { if (c == '\\' *s == 'x') { s++; if (!isxdigit(*s++)) return NULL; if (!isxdigit(*s++)) return NULL; } } return s; } See that s++ in the loop condition? There's no way in hell it would *not* increment s. If we return non-NULL, we know that c was equal to del *and* c is equal to previous_value_of_s[0], i.e. s[-1]. IOW, return value points to the byte following the first instance of delimeter starting at the argument. And p[-1] = '\0' replaces delimiter with NUL. IOW, the checks used to be correct. And got buggered. the checks are not correct. magic mask are binary fields hence checking for a NUL byte to indicate string parsing failed makes no sense. your patch restores the bug i attempted to fix -- if i wanted to ignore the first byte of the file by setting the mask or magic to 0, then binfmt_misc will wrongly reject it. the current code does reject some bad inputs -- namely, when the magic or mask is not specified. i was counting on the scanarg not incrementing the pointer in that case, but as you pointed out, that doesn't work since the func always increments the pointer. i'll see if i can handle both cases. Subject: unfuck fs/binfmt_misc.c Undo some of the prettifying braindamage. commit 7d65cf10e3d7747033b83fa18c5f3d2a498f66bc has merged at this point, but the attribution to e6084d4 is wrong. of course coding style changes functional changes shouldn't be done which is why i didn't do it. the change you're referring to above has nothing to do e6084d4 as it was added before that in 6b899c4e9a049dfca759d990bd53b14f81c3626c (where i added extended debug output). arguably those few non-debug related lines shouldn't be in that commit, but it's a long cry from style changes. -mike signature.asc Description: Digital signature
Re: [PATCH 1/2 v2] binfmt_misc: add comments & debug logs
On 28 Oct 2014 15:58, Andrew Morton wrote: > On Mon, 20 Oct 2014 19:54:14 -0400 Mike Frysinger wrote: > > On 20 Oct 2014 15:59, Joe Perches wrote: > > > On Mon, 2014-10-20 at 18:45 -0400, Mike Frysinger wrote: > > > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > > > [] > > > > @@ -323,46 +343,113 @@ static Node *create_entry(const char __user > > > > *buffer, size_t count) > > > [] > > > > + if (e->mask) { > > > > + int i; > > > > + char *masked = kmalloc(e->size, > > > > GFP_USER); > > > > > > Why GFP_USER? Does it need it? > > > > mostly a copy & paste from earlier in this func: > > e = kmalloc(memsize, GFP_USER); > > > > the code is running process context and this buffer is only for > > debugging on behalf of the user (and is shortly freed there after), so > > GFP_USER seemed appropriate. that said, i'm certainly not an expert > > here, so if the convention is to use GFP_KERNEL, it's easy enough to > > change. the kmalloc API doesn't seem to provide guidance. > > I can't see any reason to me using GFP_USER for these objects so how > about > > > From: Andrew Morton > Subject: fs/binfmt_misc.c: use GFP_KERNEL instead of GFP_USER > > GFP_USER means "honour cpuset nodes-allowed beancounting". These are > regular old kernel objects and there seems no reason to give them this > treatment. tracing the source bits showed that as the only difference i could fine, but as to what they actually impacts, i'm not sure :). i don't think it's super critical though considering only root users can update this, so it's hard to see how it'd be abused. Acked-by: Mike Frysinger -mike signature.asc Description: Digital signature
Re: [PATCH 1/2 v2] binfmt_misc: add comments debug logs
On 28 Oct 2014 15:58, Andrew Morton wrote: On Mon, 20 Oct 2014 19:54:14 -0400 Mike Frysinger wrote: On 20 Oct 2014 15:59, Joe Perches wrote: On Mon, 2014-10-20 at 18:45 -0400, Mike Frysinger wrote: diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c [] @@ -323,46 +343,113 @@ static Node *create_entry(const char __user *buffer, size_t count) [] + if (e-mask) { + int i; + char *masked = kmalloc(e-size, GFP_USER); Why GFP_USER? Does it need it? mostly a copy paste from earlier in this func: e = kmalloc(memsize, GFP_USER); the code is running process context and this buffer is only for debugging on behalf of the user (and is shortly freed there after), so GFP_USER seemed appropriate. that said, i'm certainly not an expert here, so if the convention is to use GFP_KERNEL, it's easy enough to change. the kmalloc API doesn't seem to provide guidance. I can't see any reason to me using GFP_USER for these objects so how about From: Andrew Morton a...@linux-foundation.org Subject: fs/binfmt_misc.c: use GFP_KERNEL instead of GFP_USER GFP_USER means honour cpuset nodes-allowed beancounting. These are regular old kernel objects and there seems no reason to give them this treatment. tracing the source bits showed that as the only difference i could fine, but as to what they actually impacts, i'm not sure :). i don't think it's super critical though considering only root users can update this, so it's hard to see how it'd be abused. Acked-by: Mike Frysinger vap...@gentoo.org -mike signature.asc Description: Digital signature
Re: [PATCH 1/2 v2] binfmt_misc: add comments & debug logs
On 20 Oct 2014 15:59, Joe Perches wrote: > On Mon, 2014-10-20 at 18:45 -0400, Mike Frysinger wrote: > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > [] > > @@ -323,46 +343,113 @@ static Node *create_entry(const char __user *buffer, > > size_t count) > [] > > + if (e->mask) { > > + int i; > > + char *masked = kmalloc(e->size, GFP_USER); > > Why GFP_USER? Does it need it? mostly a copy & paste from earlier in this func: e = kmalloc(memsize, GFP_USER); the code is running process context and this buffer is only for debugging on behalf of the user (and is shortly freed there after), so GFP_USER seemed appropriate. that said, i'm certainly not an expert here, so if the convention is to use GFP_KERNEL, it's easy enough to change. the kmalloc API doesn't seem to provide guidance. > > switch (res) { > > - case 1: clear_bit(Enabled, >flags); > > + case 1: > > + /* Disable this handler. */ > > + clear_bit(Enabled, >flags); > > break; > > - case 2: set_bit(Enabled, >flags); > > + case 2: > > + /* Enable this handler. */ > > + set_bit(Enabled, >flags); > > break; > > - case 3: root = dget(file->f_path.dentry->d_sb->s_root); > > + case 3: > > + /* Delete this handler. */ > > + root = dget(file->f_path.dentry->d_sb->s_root); > > mutex_lock(>d_inode->i_mutex); > > > > kill_node(e); > > Maybe move the case indents one tab position left style fixes (for the whole file) are in the 2nd patch. i specifically tried to keep those changes to a minimum in this patch (functionality vs formatting). -mike signature.asc Description: Digital signature
[PATCH 2/2 v2] binfmt_misc: clean up code style a bit
Clean up various coding style issues that checkpatch complains about. No functional changes here. Signed-off-by: Mike Frysinger --- v2 - rebased fs/binfmt_misc.c | 295 +++ 1 file changed, 146 insertions(+), 149 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 5670b17..a953458 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -1,19 +1,10 @@ /* - * binfmt_misc.c + * binfmt_misc.c * - * Copyright (C) 1997 Richard Günther + * Copyright (C) 1997 Richard Günther * - * binfmt_misc detects binaries via a magic or filename extension and invokes - * a specified wrapper. This should obsolete binfmt_java, binfmt_em86 and - * binfmt_mz. - * - * 1997-04-25 first version - * [...] - * 1997-05-19 cleanup - * 1997-06-26 hpa: pass the real filename rather than argv[0] - * 1997-06-30 minor cleanup - * 1997-08-09 removed extension stripping, locking cleanup - * 2001-02-28 AV: rewritten into something that resembles C. Original didn't. + * binfmt_misc detects binaries via a magic or filename extension and invokes + * a specified wrapper. See Documentation/binfmt_misc.txt for more details. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -48,9 +39,9 @@ static LIST_HEAD(entries); static int enabled = 1; enum {Enabled, Magic}; -#define MISC_FMT_PRESERVE_ARGV0 (1<<31) -#define MISC_FMT_OPEN_BINARY (1<<30) -#define MISC_FMT_CREDENTIALS (1<<29) +#define MISC_FMT_PRESERVE_ARGV0 (1 << 31) +#define MISC_FMT_OPEN_BINARY (1 << 30) +#define MISC_FMT_CREDENTIALS (1 << 29) typedef struct { struct list_head list; @@ -134,7 +125,7 @@ static Node *check_file(struct linux_binprm *bprm) static int load_misc_binary(struct linux_binprm *bprm) { Node *fmt; - struct file * interp_file = NULL; + struct file *interp_file = NULL; char iname[BINPRM_BUF_SIZE]; const char *iname_addr = iname; int retval; @@ -142,7 +133,7 @@ static int load_misc_binary(struct linux_binprm *bprm) retval = -ENOEXEC; if (!enabled) - goto _ret; + goto ret; /* to keep locking time low, we copy the interpreter string */ read_lock(_lock); @@ -151,25 +142,26 @@ static int load_misc_binary(struct linux_binprm *bprm) strlcpy(iname, fmt->interpreter, BINPRM_BUF_SIZE); read_unlock(_lock); if (!fmt) - goto _ret; + goto ret; if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) { retval = remove_arg_zero(bprm); if (retval) - goto _ret; + goto ret; } if (fmt->flags & MISC_FMT_OPEN_BINARY) { /* if the binary should be opened on behalf of the * interpreter than keep it open and assign descriptor -* to it */ - fd_binary = get_unused_fd(); - if (fd_binary < 0) { - retval = fd_binary; - goto _ret; - } - fd_install(fd_binary, bprm->file); +* to it +*/ + fd_binary = get_unused_fd(); + if (fd_binary < 0) { + retval = fd_binary; + goto ret; + } + fd_install(fd_binary, bprm->file); /* if the binary is not readable than enforce mm->dumpable=0 regardless of the interpreter's permissions */ @@ -182,32 +174,32 @@ static int load_misc_binary(struct linux_binprm *bprm) bprm->interp_flags |= BINPRM_FLAGS_EXECFD; bprm->interp_data = fd_binary; - } else { - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; - } + } else { + allow_write_access(bprm->file); + fput(bprm->file); + bprm->file = NULL; + } /* make argv[1] be the path to the binary */ - retval = copy_strings_kernel (1, >interp, bprm); + retval = copy_strings_kernel(1, >interp, bprm); if (retval < 0) - goto _error; + goto error; bprm->argc++; /* add the interp as argv[0] */ - retval = copy_strings_kernel (1, _addr, bprm); + retval = copy_strings_kernel(1, _addr, bprm); if (retval < 0) - goto _error; - bprm->argc ++; + goto error; + bprm->argc++; /* Update interp in case binfmt_script needs it. */ retval = bprm_change_interp(iname, bprm); if (retval < 0) - goto _error; + goto error; - interp_file = open_exec (iname); - retval = PTR_ERR (interp_file); - if (I
[PATCH 1/2 v2] binfmt_misc: add comments & debug logs
When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' > register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger --- v2: - add explicit trailing \n to all the pr_debug lines fs/binfmt_misc.c | 136 +-- 1 file changed, 121 insertions(+), 15 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index fd8beb9..5670b17 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -16,6 +16,8 @@ * 2001-02-28 AV: rewritten into something that resembles C. Original didn't. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -30,8 +32,13 @@ #include #include #include +#include -#include +#ifdef DEBUG +# define USE_DEBUG 1 +#else +# define USE_DEBUG 0 +#endif enum { VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ @@ -87,20 +94,24 @@ static Node *check_file(struct linux_binprm *bprm) char *p = strrchr(bprm->interp, '.'); struct list_head *l; + /* Walk all the registered handlers. */ list_for_each(l, ) { Node *e = list_entry(l, Node, list); char *s; int j; + /* Make sure this one is currently enabled. */ if (!test_bit(Enabled, >flags)) continue; + /* Do matching based on extension if applicable. */ if (!test_bit(Magic, >flags)) { if (p && !strcmp(e->magic, p + 1)) return e; continue; } + /* Do matching based on magic & mask. */ s = bprm->buf + e->offset; if (e->mask) { for (j = 0; j < e->size; j++) @@ -259,14 +270,17 @@ static char * check_special_flags (char * sfs, Node * e) while (cont) { switch (*p) { case 'P': + pr_debug("register: flag: P (preserve argv0)\n"); p++; e->flags |= MISC_FMT_PRESERVE_ARGV0; break; case 'O': + pr_debug("register: flag: O (open binary)\n"); p++; e->flags |= MISC_FMT_OPEN_BINARY; break; case 'C': + pr_debug("register: flag: C (preserve creds)\n"); p++; /* this flags also implies the open-binary flag */ @@ -292,6 +306,8 @@ s
Re: [PATCH 1/2] binfmt_misc: add comments & debug logs
On 19 Oct 2014 17:41, Joe Perches wrote: > On Sun, 2014-10-19 at 19:03 -0400, Mike Frysinger wrote: > > let's deploy extensive pr_debug markers at > > logical parse points, and add comments to the dense parsing logic. > [] > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > [] > > @@ -259,14 +270,17 @@ static char * check_special_flags (char * sfs, Node * > > e) > > while (cont) { > > switch (*p) { > > case 'P': > > + pr_debug("register: flag: P (preserve argv0)"); > > Missing '\n' newline. > Can you please add them as appropriate? hmm, guess i was relying on pr_cont behavior to do the right thing. will do. -mike signature.asc Description: Digital signature
Re: [PATCH 1/2] binfmt_misc: add comments debug logs
On 19 Oct 2014 17:41, Joe Perches wrote: On Sun, 2014-10-19 at 19:03 -0400, Mike Frysinger wrote: let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. [] diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c [] @@ -259,14 +270,17 @@ static char * check_special_flags (char * sfs, Node * e) while (cont) { switch (*p) { case 'P': + pr_debug(register: flag: P (preserve argv0)); Missing '\n' newline. Can you please add them as appropriate? hmm, guess i was relying on pr_cont behavior to do the right thing. will do. -mike signature.asc Description: Digital signature
[PATCH 1/2 v2] binfmt_misc: add comments debug logs
When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger vap...@gentoo.org --- v2: - add explicit trailing \n to all the pr_debug lines fs/binfmt_misc.c | 136 +-- 1 file changed, 121 insertions(+), 15 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index fd8beb9..5670b17 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -16,6 +16,8 @@ * 2001-02-28 AV: rewritten into something that resembles C. Original didn't. */ +#define pr_fmt(fmt) KBUILD_MODNAME : fmt + #include linux/module.h #include linux/init.h #include linux/sched.h @@ -30,8 +32,13 @@ #include linux/mount.h #include linux/syscalls.h #include linux/fs.h +#include linux/uaccess.h -#include asm/uaccess.h +#ifdef DEBUG +# define USE_DEBUG 1 +#else +# define USE_DEBUG 0 +#endif enum { VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ @@ -87,20 +94,24 @@ static Node *check_file(struct linux_binprm *bprm) char *p = strrchr(bprm-interp, '.'); struct list_head *l; + /* Walk all the registered handlers. */ list_for_each(l, entries) { Node *e = list_entry(l, Node, list); char *s; int j; + /* Make sure this one is currently enabled. */ if (!test_bit(Enabled, e-flags)) continue; + /* Do matching based on extension if applicable. */ if (!test_bit(Magic, e-flags)) { if (p !strcmp(e-magic, p + 1)) return e; continue; } + /* Do matching based on magic mask. */ s = bprm-buf + e-offset; if (e-mask) { for (j = 0; j e-size; j++) @@ -259,14 +270,17 @@ static char * check_special_flags (char * sfs, Node * e) while (cont) { switch (*p) { case 'P': + pr_debug(register: flag: P (preserve argv0)\n); p++; e-flags |= MISC_FMT_PRESERVE_ARGV0; break; case 'O': + pr_debug(register: flag: O (open binary)\n); p++; e-flags |= MISC_FMT_OPEN_BINARY; break; case 'C': + pr_debug(register: flag: C (preserve creds)\n); p++; /* this flags also implies the open-binary flag
[PATCH 2/2 v2] binfmt_misc: clean up code style a bit
Clean up various coding style issues that checkpatch complains about. No functional changes here. Signed-off-by: Mike Frysinger vap...@gentoo.org --- v2 - rebased fs/binfmt_misc.c | 295 +++ 1 file changed, 146 insertions(+), 149 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 5670b17..a953458 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -1,19 +1,10 @@ /* - * binfmt_misc.c + * binfmt_misc.c * - * Copyright (C) 1997 Richard Günther + * Copyright (C) 1997 Richard Günther * - * binfmt_misc detects binaries via a magic or filename extension and invokes - * a specified wrapper. This should obsolete binfmt_java, binfmt_em86 and - * binfmt_mz. - * - * 1997-04-25 first version - * [...] - * 1997-05-19 cleanup - * 1997-06-26 hpa: pass the real filename rather than argv[0] - * 1997-06-30 minor cleanup - * 1997-08-09 removed extension stripping, locking cleanup - * 2001-02-28 AV: rewritten into something that resembles C. Original didn't. + * binfmt_misc detects binaries via a magic or filename extension and invokes + * a specified wrapper. See Documentation/binfmt_misc.txt for more details. */ #define pr_fmt(fmt) KBUILD_MODNAME : fmt @@ -48,9 +39,9 @@ static LIST_HEAD(entries); static int enabled = 1; enum {Enabled, Magic}; -#define MISC_FMT_PRESERVE_ARGV0 (131) -#define MISC_FMT_OPEN_BINARY (130) -#define MISC_FMT_CREDENTIALS (129) +#define MISC_FMT_PRESERVE_ARGV0 (1 31) +#define MISC_FMT_OPEN_BINARY (1 30) +#define MISC_FMT_CREDENTIALS (1 29) typedef struct { struct list_head list; @@ -134,7 +125,7 @@ static Node *check_file(struct linux_binprm *bprm) static int load_misc_binary(struct linux_binprm *bprm) { Node *fmt; - struct file * interp_file = NULL; + struct file *interp_file = NULL; char iname[BINPRM_BUF_SIZE]; const char *iname_addr = iname; int retval; @@ -142,7 +133,7 @@ static int load_misc_binary(struct linux_binprm *bprm) retval = -ENOEXEC; if (!enabled) - goto _ret; + goto ret; /* to keep locking time low, we copy the interpreter string */ read_lock(entries_lock); @@ -151,25 +142,26 @@ static int load_misc_binary(struct linux_binprm *bprm) strlcpy(iname, fmt-interpreter, BINPRM_BUF_SIZE); read_unlock(entries_lock); if (!fmt) - goto _ret; + goto ret; if (!(fmt-flags MISC_FMT_PRESERVE_ARGV0)) { retval = remove_arg_zero(bprm); if (retval) - goto _ret; + goto ret; } if (fmt-flags MISC_FMT_OPEN_BINARY) { /* if the binary should be opened on behalf of the * interpreter than keep it open and assign descriptor -* to it */ - fd_binary = get_unused_fd(); - if (fd_binary 0) { - retval = fd_binary; - goto _ret; - } - fd_install(fd_binary, bprm-file); +* to it +*/ + fd_binary = get_unused_fd(); + if (fd_binary 0) { + retval = fd_binary; + goto ret; + } + fd_install(fd_binary, bprm-file); /* if the binary is not readable than enforce mm-dumpable=0 regardless of the interpreter's permissions */ @@ -182,32 +174,32 @@ static int load_misc_binary(struct linux_binprm *bprm) bprm-interp_flags |= BINPRM_FLAGS_EXECFD; bprm-interp_data = fd_binary; - } else { - allow_write_access(bprm-file); - fput(bprm-file); - bprm-file = NULL; - } + } else { + allow_write_access(bprm-file); + fput(bprm-file); + bprm-file = NULL; + } /* make argv[1] be the path to the binary */ - retval = copy_strings_kernel (1, bprm-interp, bprm); + retval = copy_strings_kernel(1, bprm-interp, bprm); if (retval 0) - goto _error; + goto error; bprm-argc++; /* add the interp as argv[0] */ - retval = copy_strings_kernel (1, iname_addr, bprm); + retval = copy_strings_kernel(1, iname_addr, bprm); if (retval 0) - goto _error; - bprm-argc ++; + goto error; + bprm-argc++; /* Update interp in case binfmt_script needs it. */ retval = bprm_change_interp(iname, bprm); if (retval 0) - goto _error; + goto error; - interp_file = open_exec (iname); - retval = PTR_ERR (interp_file); - if (IS_ERR (interp_file)) - goto _error; + interp_file = open_exec(iname); + retval = PTR_ERR
Re: [PATCH 1/2 v2] binfmt_misc: add comments debug logs
On 20 Oct 2014 15:59, Joe Perches wrote: On Mon, 2014-10-20 at 18:45 -0400, Mike Frysinger wrote: diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c [] @@ -323,46 +343,113 @@ static Node *create_entry(const char __user *buffer, size_t count) [] + if (e-mask) { + int i; + char *masked = kmalloc(e-size, GFP_USER); Why GFP_USER? Does it need it? mostly a copy paste from earlier in this func: e = kmalloc(memsize, GFP_USER); the code is running process context and this buffer is only for debugging on behalf of the user (and is shortly freed there after), so GFP_USER seemed appropriate. that said, i'm certainly not an expert here, so if the convention is to use GFP_KERNEL, it's easy enough to change. the kmalloc API doesn't seem to provide guidance. switch (res) { - case 1: clear_bit(Enabled, e-flags); + case 1: + /* Disable this handler. */ + clear_bit(Enabled, e-flags); break; - case 2: set_bit(Enabled, e-flags); + case 2: + /* Enable this handler. */ + set_bit(Enabled, e-flags); break; - case 3: root = dget(file-f_path.dentry-d_sb-s_root); + case 3: + /* Delete this handler. */ + root = dget(file-f_path.dentry-d_sb-s_root); mutex_lock(root-d_inode-i_mutex); kill_node(e); Maybe move the case indents one tab position left style fixes (for the whole file) are in the 2nd patch. i specifically tried to keep those changes to a minimum in this patch (functionality vs formatting). -mike signature.asc Description: Digital signature
[PATCH 1/2] binfmt_misc: add comments & debug logs
When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' > register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger --- fs/binfmt_misc.c | 136 +-- 1 file changed, 121 insertions(+), 15 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index fd8beb9..dbf0ac5 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -16,6 +16,8 @@ * 2001-02-28 AV: rewritten into something that resembles C. Original didn't. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -30,8 +32,13 @@ #include #include #include +#include -#include +#ifdef DEBUG +# define USE_DEBUG 1 +#else +# define USE_DEBUG 0 +#endif enum { VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ @@ -87,20 +94,24 @@ static Node *check_file(struct linux_binprm *bprm) char *p = strrchr(bprm->interp, '.'); struct list_head *l; + /* Walk all the registered handlers. */ list_for_each(l, ) { Node *e = list_entry(l, Node, list); char *s; int j; + /* Make sure this one is currently enabled. */ if (!test_bit(Enabled, >flags)) continue; + /* Do matching based on extension if applicable. */ if (!test_bit(Magic, >flags)) { if (p && !strcmp(e->magic, p + 1)) return e; continue; } + /* Do matching based on magic & mask. */ s = bprm->buf + e->offset; if (e->mask) { for (j = 0; j < e->size; j++) @@ -259,14 +270,17 @@ static char * check_special_flags (char * sfs, Node * e) while (cont) { switch (*p) { case 'P': + pr_debug("register: flag: P (preserve argv0)"); p++; e->flags |= MISC_FMT_PRESERVE_ARGV0; break; case 'O': + pr_debug("register: flag: O (open binary)"); p++; e->flags |= MISC_FMT_OPEN_BINARY; break; case 'C': + pr_debug("register: flag: C (preserve creds)"); p++; /* this flags also implies the open-binary flag */ @@ -292,6 +306,8 @@ static Node *create_entry(const char __user *buffer, size_t count) c
[PATCH 2/2] binfmt_misc: clean up code style a bit
Clean up various coding style issues that checkpatch complains about. No functional changes here. Signed-off-by: Mike Frysinger --- fs/binfmt_misc.c | 295 +++ 1 file changed, 146 insertions(+), 149 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index dbf0ac5..acd3245 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -1,19 +1,10 @@ /* - * binfmt_misc.c + * binfmt_misc.c * - * Copyright (C) 1997 Richard Günther + * Copyright (C) 1997 Richard Günther * - * binfmt_misc detects binaries via a magic or filename extension and invokes - * a specified wrapper. This should obsolete binfmt_java, binfmt_em86 and - * binfmt_mz. - * - * 1997-04-25 first version - * [...] - * 1997-05-19 cleanup - * 1997-06-26 hpa: pass the real filename rather than argv[0] - * 1997-06-30 minor cleanup - * 1997-08-09 removed extension stripping, locking cleanup - * 2001-02-28 AV: rewritten into something that resembles C. Original didn't. + * binfmt_misc detects binaries via a magic or filename extension and invokes + * a specified wrapper. See Documentation/binfmt_misc.txt for more details. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -48,9 +39,9 @@ static LIST_HEAD(entries); static int enabled = 1; enum {Enabled, Magic}; -#define MISC_FMT_PRESERVE_ARGV0 (1<<31) -#define MISC_FMT_OPEN_BINARY (1<<30) -#define MISC_FMT_CREDENTIALS (1<<29) +#define MISC_FMT_PRESERVE_ARGV0 (1 << 31) +#define MISC_FMT_OPEN_BINARY (1 << 30) +#define MISC_FMT_CREDENTIALS (1 << 29) typedef struct { struct list_head list; @@ -134,7 +125,7 @@ static Node *check_file(struct linux_binprm *bprm) static int load_misc_binary(struct linux_binprm *bprm) { Node *fmt; - struct file * interp_file = NULL; + struct file *interp_file = NULL; char iname[BINPRM_BUF_SIZE]; const char *iname_addr = iname; int retval; @@ -142,7 +133,7 @@ static int load_misc_binary(struct linux_binprm *bprm) retval = -ENOEXEC; if (!enabled) - goto _ret; + goto ret; /* to keep locking time low, we copy the interpreter string */ read_lock(_lock); @@ -151,25 +142,26 @@ static int load_misc_binary(struct linux_binprm *bprm) strlcpy(iname, fmt->interpreter, BINPRM_BUF_SIZE); read_unlock(_lock); if (!fmt) - goto _ret; + goto ret; if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) { retval = remove_arg_zero(bprm); if (retval) - goto _ret; + goto ret; } if (fmt->flags & MISC_FMT_OPEN_BINARY) { /* if the binary should be opened on behalf of the * interpreter than keep it open and assign descriptor -* to it */ - fd_binary = get_unused_fd(); - if (fd_binary < 0) { - retval = fd_binary; - goto _ret; - } - fd_install(fd_binary, bprm->file); +* to it +*/ + fd_binary = get_unused_fd(); + if (fd_binary < 0) { + retval = fd_binary; + goto ret; + } + fd_install(fd_binary, bprm->file); /* if the binary is not readable than enforce mm->dumpable=0 regardless of the interpreter's permissions */ @@ -182,32 +174,32 @@ static int load_misc_binary(struct linux_binprm *bprm) bprm->interp_flags |= BINPRM_FLAGS_EXECFD; bprm->interp_data = fd_binary; - } else { - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; - } + } else { + allow_write_access(bprm->file); + fput(bprm->file); + bprm->file = NULL; + } /* make argv[1] be the path to the binary */ - retval = copy_strings_kernel (1, >interp, bprm); + retval = copy_strings_kernel(1, >interp, bprm); if (retval < 0) - goto _error; + goto error; bprm->argc++; /* add the interp as argv[0] */ - retval = copy_strings_kernel (1, _addr, bprm); + retval = copy_strings_kernel(1, _addr, bprm); if (retval < 0) - goto _error; - bprm->argc ++; + goto error; + bprm->argc++; /* Update interp in case binfmt_script needs it. */ retval = bprm_change_interp(iname, bprm); if (retval < 0) - goto _error; + goto error; - interp_file = open_exec (iname); - retval = PTR_ERR (interp_file); - if (IS_ERR (interp_file))
[PATCH 1/2] binfmt_misc: add comments debug logs
When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger vap...@gentoo.org --- fs/binfmt_misc.c | 136 +-- 1 file changed, 121 insertions(+), 15 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index fd8beb9..dbf0ac5 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -16,6 +16,8 @@ * 2001-02-28 AV: rewritten into something that resembles C. Original didn't. */ +#define pr_fmt(fmt) KBUILD_MODNAME : fmt + #include linux/module.h #include linux/init.h #include linux/sched.h @@ -30,8 +32,13 @@ #include linux/mount.h #include linux/syscalls.h #include linux/fs.h +#include linux/uaccess.h -#include asm/uaccess.h +#ifdef DEBUG +# define USE_DEBUG 1 +#else +# define USE_DEBUG 0 +#endif enum { VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ @@ -87,20 +94,24 @@ static Node *check_file(struct linux_binprm *bprm) char *p = strrchr(bprm-interp, '.'); struct list_head *l; + /* Walk all the registered handlers. */ list_for_each(l, entries) { Node *e = list_entry(l, Node, list); char *s; int j; + /* Make sure this one is currently enabled. */ if (!test_bit(Enabled, e-flags)) continue; + /* Do matching based on extension if applicable. */ if (!test_bit(Magic, e-flags)) { if (p !strcmp(e-magic, p + 1)) return e; continue; } + /* Do matching based on magic mask. */ s = bprm-buf + e-offset; if (e-mask) { for (j = 0; j e-size; j++) @@ -259,14 +270,17 @@ static char * check_special_flags (char * sfs, Node * e) while (cont) { switch (*p) { case 'P': + pr_debug(register: flag: P (preserve argv0)); p++; e-flags |= MISC_FMT_PRESERVE_ARGV0; break; case 'O': + pr_debug(register: flag: O (open binary)); p++; e-flags |= MISC_FMT_OPEN_BINARY; break; case 'C': + pr_debug(register: flag: C (preserve creds)); p++; /* this flags also implies the open-binary flag */ @@ -292,6 +306,8 @@ static Node *create_entry(const char __user *buffer, size_t
[PATCH 2/2] binfmt_misc: clean up code style a bit
Clean up various coding style issues that checkpatch complains about. No functional changes here. Signed-off-by: Mike Frysinger vap...@gentoo.org --- fs/binfmt_misc.c | 295 +++ 1 file changed, 146 insertions(+), 149 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index dbf0ac5..acd3245 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -1,19 +1,10 @@ /* - * binfmt_misc.c + * binfmt_misc.c * - * Copyright (C) 1997 Richard Günther + * Copyright (C) 1997 Richard Günther * - * binfmt_misc detects binaries via a magic or filename extension and invokes - * a specified wrapper. This should obsolete binfmt_java, binfmt_em86 and - * binfmt_mz. - * - * 1997-04-25 first version - * [...] - * 1997-05-19 cleanup - * 1997-06-26 hpa: pass the real filename rather than argv[0] - * 1997-06-30 minor cleanup - * 1997-08-09 removed extension stripping, locking cleanup - * 2001-02-28 AV: rewritten into something that resembles C. Original didn't. + * binfmt_misc detects binaries via a magic or filename extension and invokes + * a specified wrapper. See Documentation/binfmt_misc.txt for more details. */ #define pr_fmt(fmt) KBUILD_MODNAME : fmt @@ -48,9 +39,9 @@ static LIST_HEAD(entries); static int enabled = 1; enum {Enabled, Magic}; -#define MISC_FMT_PRESERVE_ARGV0 (131) -#define MISC_FMT_OPEN_BINARY (130) -#define MISC_FMT_CREDENTIALS (129) +#define MISC_FMT_PRESERVE_ARGV0 (1 31) +#define MISC_FMT_OPEN_BINARY (1 30) +#define MISC_FMT_CREDENTIALS (1 29) typedef struct { struct list_head list; @@ -134,7 +125,7 @@ static Node *check_file(struct linux_binprm *bprm) static int load_misc_binary(struct linux_binprm *bprm) { Node *fmt; - struct file * interp_file = NULL; + struct file *interp_file = NULL; char iname[BINPRM_BUF_SIZE]; const char *iname_addr = iname; int retval; @@ -142,7 +133,7 @@ static int load_misc_binary(struct linux_binprm *bprm) retval = -ENOEXEC; if (!enabled) - goto _ret; + goto ret; /* to keep locking time low, we copy the interpreter string */ read_lock(entries_lock); @@ -151,25 +142,26 @@ static int load_misc_binary(struct linux_binprm *bprm) strlcpy(iname, fmt-interpreter, BINPRM_BUF_SIZE); read_unlock(entries_lock); if (!fmt) - goto _ret; + goto ret; if (!(fmt-flags MISC_FMT_PRESERVE_ARGV0)) { retval = remove_arg_zero(bprm); if (retval) - goto _ret; + goto ret; } if (fmt-flags MISC_FMT_OPEN_BINARY) { /* if the binary should be opened on behalf of the * interpreter than keep it open and assign descriptor -* to it */ - fd_binary = get_unused_fd(); - if (fd_binary 0) { - retval = fd_binary; - goto _ret; - } - fd_install(fd_binary, bprm-file); +* to it +*/ + fd_binary = get_unused_fd(); + if (fd_binary 0) { + retval = fd_binary; + goto ret; + } + fd_install(fd_binary, bprm-file); /* if the binary is not readable than enforce mm-dumpable=0 regardless of the interpreter's permissions */ @@ -182,32 +174,32 @@ static int load_misc_binary(struct linux_binprm *bprm) bprm-interp_flags |= BINPRM_FLAGS_EXECFD; bprm-interp_data = fd_binary; - } else { - allow_write_access(bprm-file); - fput(bprm-file); - bprm-file = NULL; - } + } else { + allow_write_access(bprm-file); + fput(bprm-file); + bprm-file = NULL; + } /* make argv[1] be the path to the binary */ - retval = copy_strings_kernel (1, bprm-interp, bprm); + retval = copy_strings_kernel(1, bprm-interp, bprm); if (retval 0) - goto _error; + goto error; bprm-argc++; /* add the interp as argv[0] */ - retval = copy_strings_kernel (1, iname_addr, bprm); + retval = copy_strings_kernel(1, iname_addr, bprm); if (retval 0) - goto _error; - bprm-argc ++; + goto error; + bprm-argc++; /* Update interp in case binfmt_script needs it. */ retval = bprm_change_interp(iname, bprm); if (retval 0) - goto _error; + goto error; - interp_file = open_exec (iname); - retval = PTR_ERR (interp_file); - if (IS_ERR (interp_file)) - goto _error; + interp_file = open_exec(iname); + retval = PTR_ERR(interp_file
[PATCH 1/2] binfmt_misc: expand the register format limit to 1920 bytes
The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger --- Documentation/binfmt_misc.txt | 2 +- fs/binfmt_misc.c | 19 +-- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt index c1ed694..f64372b 100644 --- a/Documentation/binfmt_misc.txt +++ b/Documentation/binfmt_misc.txt @@ -58,7 +58,7 @@ Here is what the fields mean: There are some restrictions: - - the whole register string may not exceed 255 characters + - the whole register string may not exceed 1920 characters - the magic must reside in the first 128 bytes of the file, i.e. offset+size(magic) has to be less than 128 - the interpreter string may not exceed 127 characters diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index b605003..c96b855 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -62,7 +62,22 @@ static struct file_system_type bm_fs_type; static struct vfsmount *bm_mnt; static int entry_count; -/* +/* + * Max length o
[PATCH 2/2] binfmt_misc: touch up documentation a bit
Line wrap the content to 80 cols, and add more details to various fields to match the code. Drop reference to a website that does not exist anymore. Signed-off-by: Mike Frysinger --- Documentation/binfmt_misc.txt | 48 +-- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt index f64372b..6b1de70 100644 --- a/Documentation/binfmt_misc.txt +++ b/Documentation/binfmt_misc.txt @@ -15,39 +15,50 @@ First you must mount binfmt_misc: mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc To actually register a new binary type, you have to set up a string looking like -:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':' upon -your needs) and echo it to /proc/sys/fs/binfmt_misc/register. +:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':' +upon your needs) and echo it to /proc/sys/fs/binfmt_misc/register. + Here is what the fields mean: - 'name' is an identifier string. A new /proc file will be created with this - name below /proc/sys/fs/binfmt_misc + name below /proc/sys/fs/binfmt_misc; cannot contain slashes '/' for obvious + reasons. - 'type' is the type of recognition. Give 'M' for magic and 'E' for extension. - 'offset' is the offset of the magic/mask in the file, counted in bytes. This - defaults to 0 if you omit it (i.e. you write ':name:type::magic...') + defaults to 0 if you omit it (i.e. you write ':name:type::magic...'). Ignored + when using filename extension matching. - 'magic' is the byte sequence binfmt_misc is matching for. The magic string - may contain hex-encoded characters like \x0a or \xA4. In a shell environment - you will have to write \\x0a to prevent the shell from eating your \. + may contain hex-encoded characters like \x0a or \xA4. Note that you must + escape any NUL bytes; parsing halts at the first one. In a shell environment + you might have to write \\x0a to prevent the shell from eating your \. If you chose filename extension matching, this is the extension to be recognised (without the '.', the \x0a specials are not allowed). Extension - matching is case sensitive! + matching is case sensitive, and slashes '/' are not allowed! - 'mask' is an (optional, defaults to all 0xff) mask. You can mask out some bits from matching by supplying a string like magic and as long as magic. - The mask is anded with the byte sequence of the file. + The mask is anded with the byte sequence of the file. Note that you must + escape any NUL bytes; parsing halts at the first one. Ignored when using + filename extension matching. - 'interpreter' is the program that should be invoked with the binary as first argument (specify the full path) - 'flags' is an optional field that controls several aspects of the invocation - of the interpreter. It is a string of capital letters, each controls a certain - aspect. The following flags are supported - - 'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite the -original argv[0] with the full path to the binary. When this flag is -included, binfmt_misc will add an argument to the argument vector for -this purpose, thus preserving the original argv[0]. + of the interpreter. It is a string of capital letters, each controls a + certain aspect. The following flags are supported - + 'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite +the original argv[0] with the full path to the binary. When this +flag is included, binfmt_misc will add an argument to the argument +vector for this purpose, thus preserving the original argv[0]. +e.g. If your interp is set to /bin/foo and you run `blah` (which is +in /usr/local/bin), then the kernel will execute /bin/foo with +argv[] set to ["/bin/foo", "/usr/local/bin/blah", "blah"]. The +interp has to be aware of this so it can execute /usr/local/bin/blah +with argv[] set to ["blah"]. 'O' - open-binary. Legacy behavior of binfmt_misc is to pass the full path of the binary to the interpreter as an argument. When this flag is included, binfmt_misc will open the file for reading and pass its descriptor as an argument, instead of the full path, thus allowing -the interpreter to execute non-readable binaries. This feature should -be used with care - the interpreter has to be trusted not to emit -the contents of the non-readable binary. +the interpreter to execute non-readable binaries. This feature +should be used with care - the interpreter has to be trusted not to +emit the contents of the non-readable binary. 'C' - credentials
[PATCH 2/2] binfmt_misc: touch up documentation a bit
Line wrap the content to 80 cols, and add more details to various fields to match the code. Drop reference to a website that does not exist anymore. Signed-off-by: Mike Frysinger vap...@gentoo.org --- Documentation/binfmt_misc.txt | 48 +-- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt index f64372b..6b1de70 100644 --- a/Documentation/binfmt_misc.txt +++ b/Documentation/binfmt_misc.txt @@ -15,39 +15,50 @@ First you must mount binfmt_misc: mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc To actually register a new binary type, you have to set up a string looking like -:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':' upon -your needs) and echo it to /proc/sys/fs/binfmt_misc/register. +:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':' +upon your needs) and echo it to /proc/sys/fs/binfmt_misc/register. + Here is what the fields mean: - 'name' is an identifier string. A new /proc file will be created with this - name below /proc/sys/fs/binfmt_misc + name below /proc/sys/fs/binfmt_misc; cannot contain slashes '/' for obvious + reasons. - 'type' is the type of recognition. Give 'M' for magic and 'E' for extension. - 'offset' is the offset of the magic/mask in the file, counted in bytes. This - defaults to 0 if you omit it (i.e. you write ':name:type::magic...') + defaults to 0 if you omit it (i.e. you write ':name:type::magic...'). Ignored + when using filename extension matching. - 'magic' is the byte sequence binfmt_misc is matching for. The magic string - may contain hex-encoded characters like \x0a or \xA4. In a shell environment - you will have to write \\x0a to prevent the shell from eating your \. + may contain hex-encoded characters like \x0a or \xA4. Note that you must + escape any NUL bytes; parsing halts at the first one. In a shell environment + you might have to write \\x0a to prevent the shell from eating your \. If you chose filename extension matching, this is the extension to be recognised (without the '.', the \x0a specials are not allowed). Extension - matching is case sensitive! + matching is case sensitive, and slashes '/' are not allowed! - 'mask' is an (optional, defaults to all 0xff) mask. You can mask out some bits from matching by supplying a string like magic and as long as magic. - The mask is anded with the byte sequence of the file. + The mask is anded with the byte sequence of the file. Note that you must + escape any NUL bytes; parsing halts at the first one. Ignored when using + filename extension matching. - 'interpreter' is the program that should be invoked with the binary as first argument (specify the full path) - 'flags' is an optional field that controls several aspects of the invocation - of the interpreter. It is a string of capital letters, each controls a certain - aspect. The following flags are supported - - 'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite the -original argv[0] with the full path to the binary. When this flag is -included, binfmt_misc will add an argument to the argument vector for -this purpose, thus preserving the original argv[0]. + of the interpreter. It is a string of capital letters, each controls a + certain aspect. The following flags are supported - + 'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite +the original argv[0] with the full path to the binary. When this +flag is included, binfmt_misc will add an argument to the argument +vector for this purpose, thus preserving the original argv[0]. +e.g. If your interp is set to /bin/foo and you run `blah` (which is +in /usr/local/bin), then the kernel will execute /bin/foo with +argv[] set to [/bin/foo, /usr/local/bin/blah, blah]. The +interp has to be aware of this so it can execute /usr/local/bin/blah +with argv[] set to [blah]. 'O' - open-binary. Legacy behavior of binfmt_misc is to pass the full path of the binary to the interpreter as an argument. When this flag is included, binfmt_misc will open the file for reading and pass its descriptor as an argument, instead of the full path, thus allowing -the interpreter to execute non-readable binaries. This feature should -be used with care - the interpreter has to be trusted not to emit -the contents of the non-readable binary. +the interpreter to execute non-readable binaries. This feature +should be used with care - the interpreter has to be trusted not to +emit the contents of the non-readable binary. 'C' - credentials. Currently, the behavior of binfmt_misc
[PATCH 1/2] binfmt_misc: expand the register format limit to 1920 bytes
The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes qemu-mipsel. The interp uses 20 bytes /usr/bin/qemu-mipsel. The type flags takes up 4 bytes. We need 7 bytes for the delimiter (usually :). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger vap...@gentoo.org --- Documentation/binfmt_misc.txt | 2 +- fs/binfmt_misc.c | 19 +-- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt index c1ed694..f64372b 100644 --- a/Documentation/binfmt_misc.txt +++ b/Documentation/binfmt_misc.txt @@ -58,7 +58,7 @@ Here is what the fields mean: There are some restrictions: - - the whole register string may not exceed 255 characters + - the whole register string may not exceed 1920 characters - the magic must reside in the first 128 bytes of the file, i.e. offset+size(magic) has to be less than 128 - the interpreter string may not exceed 127 characters diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index b605003..c96b855 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -62,7 +62,22 @@ static struct file_system_type bm_fs_type; static struct vfsmount *bm_mnt; static int entry_count; -/* +/* + * Max length of the register string. Determined by: + * - 7 delimiters
Re: [RFA][PATCH 24/27] Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
On Thu 26 Jun 2014 12:52:45 Steven Rostedt wrote: > From: "Steven Rostedt (Red Hat)" > > Nothing sets function_trace_stop to disable function tracing anymore. > Remove the check for it in the arch code. > > [ Please test this on your arch ] i haven't tested, but the asm looks correct -- it's only using scratch registers, and the rest of the code doesn't rely on the state in that ifdef (by design). Acked-by: Mike Frysinger -mike signature.asc Description: This is a digitally signed message part.
Re: [RFA][PATCH 24/27] Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
On Thu 26 Jun 2014 12:52:45 Steven Rostedt wrote: From: Steven Rostedt (Red Hat) rost...@goodmis.org Nothing sets function_trace_stop to disable function tracing anymore. Remove the check for it in the arch code. [ Please test this on your arch ] i haven't tested, but the asm looks correct -- it's only using scratch registers, and the rest of the code doesn't rely on the state in that ifdef (by design). Acked-by: Mike Frysinger vap...@gentoo.org -mike signature.asc Description: This is a digitally signed message part.
[tip:x86/x32] x86, x32: Use compat shims for io_{setup,submit}
Commit-ID: 7fd44dacdd803c0bbf38bf478d51d280902bb0f1 Gitweb: http://git.kernel.org/tip/7fd44dacdd803c0bbf38bf478d51d280902bb0f1 Author: Mike Frysinger AuthorDate: Sun, 4 May 2014 20:43:15 -0400 Committer: H. Peter Anvin CommitDate: Sun, 4 May 2014 17:49:22 -0700 x86, x32: Use compat shims for io_{setup,submit} The io_setup takes a pointer to a context id of type aio_context_t. This in turn is typed to a __kernel_ulong_t. We could tweak the exported headers to define this as a 64bit quantity for specific ABIs, but since we already have a 32bit compat shim for the x86 ABI, let's just re-use that logic. The libaio package is also written to expect this as a pointer type, so a compat shim would simplify that. The io_submit func operates on an array of pointers to iocb structs. Padding out the array to be 64bit aligned is a huge pain, so convert it over to the existing compat shim too. We don't convert io_getevents to the compat func as its only purpose is to handle the timespec struct, and the x32 ABI uses 64bit times. With this change, the libaio package can now pass its testsuite when built for the x32 ABI. Signed-off-by: Mike Frysinger Link: http://lkml.kernel.org/r/1399250595-5005-1-git-send-email-vap...@gentoo.org Cc: H.J. Lu Signed-off-by: H. Peter Anvin Cc: # v3.4+ --- arch/x86/syscalls/syscall_64.tbl | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 04376ac..ec255a1 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -212,10 +212,10 @@ 203common sched_setaffinity sys_sched_setaffinity 204common sched_getaffinity sys_sched_getaffinity 20564 set_thread_area -206common io_setupsys_io_setup +20664 io_setupsys_io_setup 207common io_destroy sys_io_destroy 208common io_geteventssys_io_getevents -209common io_submit sys_io_submit +20964 io_submit sys_io_submit 210common io_cancel sys_io_cancel 21164 get_thread_area 212common lookup_dcookie sys_lookup_dcookie @@ -359,3 +359,5 @@ 540x32 process_vm_writev compat_sys_process_vm_writev 541x32 setsockopt compat_sys_setsockopt 542x32 getsockopt compat_sys_getsockopt +543x32 io_setupcompat_sys_io_setup +544x32 io_submit compat_sys_io_submit -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x32: use compat shims for io_{setup,submit}
The io_setup takes a pointer to a context id of type aio_context_t. This in turn is typed to a __kernel_ulong_t. We could tweak the exported headers to define this as a 64bit quantity for specific ABIs, but since we already have a 32bit compat shim for the x86 ABI, let's just re-use that logic. The libaio package is also written to expect this as a pointer type, so a compat shim would simplify that. The io_submit func operates on an array of pointers to iocb structs. Padding out the array to be 64bit aligned is a huge pain, so convert it over to the existing compat shim too. We don't convert io_getevents to the compat func as its only purpose is to handle the timespec struct, and the x32 ABI uses 64bit times. With this change, the libaio package can now pass its testsuite when built for the x32 ABI. Signed-off-by: Mike Frysinger --- arch/x86/syscalls/syscall_64.tbl | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 04376ac..ec255a1 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -212,10 +212,10 @@ 203common sched_setaffinity sys_sched_setaffinity 204common sched_getaffinity sys_sched_getaffinity 20564 set_thread_area -206common io_setupsys_io_setup +20664 io_setupsys_io_setup 207common io_destroy sys_io_destroy 208common io_geteventssys_io_getevents -209common io_submit sys_io_submit +20964 io_submit sys_io_submit 210common io_cancel sys_io_cancel 21164 get_thread_area 212common lookup_dcookie sys_lookup_dcookie @@ -359,3 +359,5 @@ 540x32 process_vm_writev compat_sys_process_vm_writev 541x32 setsockopt compat_sys_setsockopt 542x32 getsockopt compat_sys_getsockopt +543x32 io_setupcompat_sys_io_setup +544x32 io_submit compat_sys_io_submit -- 1.9.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x32: use compat shims for io_{setup,submit}
The io_setup takes a pointer to a context id of type aio_context_t. This in turn is typed to a __kernel_ulong_t. We could tweak the exported headers to define this as a 64bit quantity for specific ABIs, but since we already have a 32bit compat shim for the x86 ABI, let's just re-use that logic. The libaio package is also written to expect this as a pointer type, so a compat shim would simplify that. The io_submit func operates on an array of pointers to iocb structs. Padding out the array to be 64bit aligned is a huge pain, so convert it over to the existing compat shim too. We don't convert io_getevents to the compat func as its only purpose is to handle the timespec struct, and the x32 ABI uses 64bit times. With this change, the libaio package can now pass its testsuite when built for the x32 ABI. Signed-off-by: Mike Frysinger vap...@gentoo.org --- arch/x86/syscalls/syscall_64.tbl | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 04376ac..ec255a1 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -212,10 +212,10 @@ 203common sched_setaffinity sys_sched_setaffinity 204common sched_getaffinity sys_sched_getaffinity 20564 set_thread_area -206common io_setupsys_io_setup +20664 io_setupsys_io_setup 207common io_destroy sys_io_destroy 208common io_geteventssys_io_getevents -209common io_submit sys_io_submit +20964 io_submit sys_io_submit 210common io_cancel sys_io_cancel 21164 get_thread_area 212common lookup_dcookie sys_lookup_dcookie @@ -359,3 +359,5 @@ 540x32 process_vm_writev compat_sys_process_vm_writev 541x32 setsockopt compat_sys_setsockopt 542x32 getsockopt compat_sys_getsockopt +543x32 io_setupcompat_sys_io_setup +544x32 io_submit compat_sys_io_submit -- 1.9.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/x32] x86, x32: Use compat shims for io_{setup,submit}
Commit-ID: 7fd44dacdd803c0bbf38bf478d51d280902bb0f1 Gitweb: http://git.kernel.org/tip/7fd44dacdd803c0bbf38bf478d51d280902bb0f1 Author: Mike Frysinger vap...@gentoo.org AuthorDate: Sun, 4 May 2014 20:43:15 -0400 Committer: H. Peter Anvin h...@zytor.com CommitDate: Sun, 4 May 2014 17:49:22 -0700 x86, x32: Use compat shims for io_{setup,submit} The io_setup takes a pointer to a context id of type aio_context_t. This in turn is typed to a __kernel_ulong_t. We could tweak the exported headers to define this as a 64bit quantity for specific ABIs, but since we already have a 32bit compat shim for the x86 ABI, let's just re-use that logic. The libaio package is also written to expect this as a pointer type, so a compat shim would simplify that. The io_submit func operates on an array of pointers to iocb structs. Padding out the array to be 64bit aligned is a huge pain, so convert it over to the existing compat shim too. We don't convert io_getevents to the compat func as its only purpose is to handle the timespec struct, and the x32 ABI uses 64bit times. With this change, the libaio package can now pass its testsuite when built for the x32 ABI. Signed-off-by: Mike Frysinger vap...@gentoo.org Link: http://lkml.kernel.org/r/1399250595-5005-1-git-send-email-vap...@gentoo.org Cc: H.J. Lu hjl.to...@gmail.com Signed-off-by: H. Peter Anvin h...@zytor.com Cc: sta...@vger.kernel.org # v3.4+ --- arch/x86/syscalls/syscall_64.tbl | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 04376ac..ec255a1 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -212,10 +212,10 @@ 203common sched_setaffinity sys_sched_setaffinity 204common sched_getaffinity sys_sched_getaffinity 20564 set_thread_area -206common io_setupsys_io_setup +20664 io_setupsys_io_setup 207common io_destroy sys_io_destroy 208common io_geteventssys_io_getevents -209common io_submit sys_io_submit +20964 io_submit sys_io_submit 210common io_cancel sys_io_cancel 21164 get_thread_area 212common lookup_dcookie sys_lookup_dcookie @@ -359,3 +359,5 @@ 540x32 process_vm_writev compat_sys_process_vm_writev 541x32 setsockopt compat_sys_setsockopt 542x32 getsockopt compat_sys_getsockopt +543x32 io_setupcompat_sys_io_setup +544x32 io_submit compat_sys_io_submit -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: For review: open_by_name_at(2) man page [v2]
On Tue 18 Mar 2014 13:55:15 Michael Kerrisk wrote: > The > .I flags > argument is a bit mask constructed by ORing together > zero or more of the following value: > .TP > .B AT_EMPTY_PATH > Allow > .I pathname > to be an empty string. > See above. > (which may have been obtained using the > .BR open (2) > .B O_PATH > flag). > .TP > .B AT_SYMLINK_FOLLOW > By default, > .BR name_to_handle_at () > does not dereference > .I pathname > if it is a symbolic link. > The flag > .B AT_SYMLINK_FOLLOW > can be specified in > .I flags > to cause > .I pathname > to be dereferenced if it is a symbolic link. this section is only talking about |flags|, and further this part is only talking about AT_SYMLINK_FOLLOW. so this last sentence sounds super redundant. how about reversing the sentence order so that both are implicit like is done in the openat() page and the description of O_NOFOLLOW ? > .B ENOTDIR > The file descriptor supplied in > .I dirfd > does not refer to a directory, > and it it is not the case that both "it" is duplicated > .SS Obtaining a persistent filesystem ID > The mount IDs in > .IR /proc/self/mountinfo > can be reused as filesystems are unmounted and mounted. > Therefore, the mount ID returned by > .BR name_to_handle_at (3) should be () and not (3) side note: this seems like an easy error to script for ... > $ \fBecho 'Kannst du bitte überlegen?' > cecilia.txt\fP aber, ich spreche kein Deutsch :( do we have a standard about sticking to english ? i wonder if people are more likely to be confused or to appreciate it ... > #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \\ > } while (0) i wonder if err.h makes sense now that this is a man page for completely linux-specific syscalls :). and you use _GNU_SOURCE. > int > main(int argc, char *argv[]) > { > struct file_handle *fhp; > int mount_id, fhsize, s; > > if (argc < 2 || strcmp(argv[1], "\-\-help") == 0) { argc != 2 ? > /* Allocate file_handle structure */ > > fhsize = sizeof(struct file_handle *); pretty sure this is wrong as sizeof() here returns the size of a pointer, not the size of the struct. it's why i prefer the form: fhsize = sizeof(*fhp); less typing and harder to screw up by accident. granted, the case below won't crash since the kernel only reads/writes sizeof(unsigned int) and i'm not aware of any system where that is larger than sizeof(void *), but it's still wrong :). > s = name_to_handle_at(AT_FDCWD, argv[1], fhp, _id, 0); another personal style: create dedicated variables for each arg you unpack out of argv[1]. it's generally OK when you only take one arg, but when you get more than one, you end up flipping back and forth between the usage trying to figure out what index 1 represents instead of focusing on what the code is doing. const char *pathname = argv[1]; > fhsize = sizeof(struct file_handle) + fhp\->handle_bytes; fhsize += fhp->handle_bytes ? it's the same, but i think nicer ;) > /* Write mount ID, file handle size, and file handle to stdout, >for later reuse by t_open_by_handle_at.c */ > > if (write(STDOUT_FILENO, _id, sizeof(int)) != sizeof(int) || > write(STDOUT_FILENO, , sizeof(int)) != sizeof(int) || > write(STDOUT_FILENO, fhp, fhsize) != fhsize) { seems like a whole lot of code spew for a simple printf() ? you'd have to adjust the other program to use scanf(), but seems like the end result would be nicer for users to experiment with ? > static int > open_mount_path_by_id(int mount_id) > { > char *linep; > size_t lsize; > char mount_path[PATH_MAX]; > int fmnt_id, fnd, nread; could we buy a few more letters for these vars ? i guess fmnt_id is the filesystem mount id, and fnd is "find". also, getline() returns a ssize_t, not an int. > FILE *fp; > > fp = fopen("/proc/self/mountinfo", "r"); only one space before the = i would encourage using the "e" flag whenever possible in the hopes that someone might start using it in their own code base. fp = fopen("/proc/self/mountinfo", "re"); > for (fnd = 0; !fnd ; ) { in my experience, seems like a while() loop makes more sense when you're implementing a while() loop ... fnd = 0; while (!fnd) { > linep = NULL; > nread = getline(, , fp); this works, but it's unusual when using getline() as it kind of defeats the purpose of using the dyn allocation feature. fnd = 0; linep = NULL; while (!fnd) { nread = getline(, , fp); ... } free(linep); i don't think it complicates the code much more ? > if (nread == \-1) > break; > > nread = sscanf(linep, "%d %*d %*s %*s %s", _id, mount_path); indent is off here > return open(mount_path, O_RDONLY | O_DIRECTORY); O_CLOEXEC for funsies ? > int > main(int argc, char *argv[]) > { > struct
Re: For review: open_by_name_at(2) man page [v2]
On Tue 18 Mar 2014 13:55:15 Michael Kerrisk wrote: The .I flags argument is a bit mask constructed by ORing together zero or more of the following value: .TP .B AT_EMPTY_PATH Allow .I pathname to be an empty string. See above. (which may have been obtained using the .BR open (2) .B O_PATH flag). .TP .B AT_SYMLINK_FOLLOW By default, .BR name_to_handle_at () does not dereference .I pathname if it is a symbolic link. The flag .B AT_SYMLINK_FOLLOW can be specified in .I flags to cause .I pathname to be dereferenced if it is a symbolic link. this section is only talking about |flags|, and further this part is only talking about AT_SYMLINK_FOLLOW. so this last sentence sounds super redundant. how about reversing the sentence order so that both are implicit like is done in the openat() page and the description of O_NOFOLLOW ? .B ENOTDIR The file descriptor supplied in .I dirfd does not refer to a directory, and it it is not the case that both it is duplicated .SS Obtaining a persistent filesystem ID The mount IDs in .IR /proc/self/mountinfo can be reused as filesystems are unmounted and mounted. Therefore, the mount ID returned by .BR name_to_handle_at (3) should be () and not (3) side note: this seems like an easy error to script for ... $ \fBecho 'Kannst du bitte überlegen?' cecilia.txt\fP aber, ich spreche kein Deutsch :( do we have a standard about sticking to english ? i wonder if people are more likely to be confused or to appreciate it ... #define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \\ } while (0) i wonder if err.h makes sense now that this is a man page for completely linux-specific syscalls :). and you use _GNU_SOURCE. int main(int argc, char *argv[]) { struct file_handle *fhp; int mount_id, fhsize, s; if (argc 2 || strcmp(argv[1], \-\-help) == 0) { argc != 2 ? /* Allocate file_handle structure */ fhsize = sizeof(struct file_handle *); pretty sure this is wrong as sizeof() here returns the size of a pointer, not the size of the struct. it's why i prefer the form: fhsize = sizeof(*fhp); less typing and harder to screw up by accident. granted, the case below won't crash since the kernel only reads/writes sizeof(unsigned int) and i'm not aware of any system where that is larger than sizeof(void *), but it's still wrong :). s = name_to_handle_at(AT_FDCWD, argv[1], fhp, mount_id, 0); another personal style: create dedicated variables for each arg you unpack out of argv[1]. it's generally OK when you only take one arg, but when you get more than one, you end up flipping back and forth between the usage trying to figure out what index 1 represents instead of focusing on what the code is doing. const char *pathname = argv[1]; fhsize = sizeof(struct file_handle) + fhp\-handle_bytes; fhsize += fhp-handle_bytes ? it's the same, but i think nicer ;) /* Write mount ID, file handle size, and file handle to stdout, for later reuse by t_open_by_handle_at.c */ if (write(STDOUT_FILENO, mount_id, sizeof(int)) != sizeof(int) || write(STDOUT_FILENO, fhsize, sizeof(int)) != sizeof(int) || write(STDOUT_FILENO, fhp, fhsize) != fhsize) { seems like a whole lot of code spew for a simple printf() ? you'd have to adjust the other program to use scanf(), but seems like the end result would be nicer for users to experiment with ? static int open_mount_path_by_id(int mount_id) { char *linep; size_t lsize; char mount_path[PATH_MAX]; int fmnt_id, fnd, nread; could we buy a few more letters for these vars ? i guess fmnt_id is the filesystem mount id, and fnd is find. also, getline() returns a ssize_t, not an int. FILE *fp; fp = fopen(/proc/self/mountinfo, r); only one space before the = i would encourage using the e flag whenever possible in the hopes that someone might start using it in their own code base. fp = fopen(/proc/self/mountinfo, re); for (fnd = 0; !fnd ; ) { in my experience, seems like a while() loop makes more sense when you're implementing a while() loop ... fnd = 0; while (!fnd) { linep = NULL; nread = getline(linep, lsize, fp); this works, but it's unusual when using getline() as it kind of defeats the purpose of using the dyn allocation feature. fnd = 0; linep = NULL; while (!fnd) { nread = getline(linep, lsize, fp); ... } free(linep); i don't think it complicates the code much more ? if (nread == \-1) break; nread = sscanf(linep, %d %*d %*s %*s %s, fmnt_id, mount_path); indent is off here return open(mount_path, O_RDONLY | O_DIRECTORY); O_CLOEXEC for funsies ? int main(int argc, char *argv[]) { struct file_handle *fhp; int mount_id, fd, mount_fd, fhsize;
[PATCH] uapi: dn: pull in ioctl.h header
This header uses _IOW/_IOR defines but doesn't include ioctl.h for it. If you try to use this w/out including ioctl.h yourself, it can fail to build, so add the explicit include. Signed-off-by: Mike Frysinger --- include/uapi/linux/dn.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/dn.h b/include/uapi/linux/dn.h index 5fbdd3d..4295c74 100644 --- a/include/uapi/linux/dn.h +++ b/include/uapi/linux/dn.h @@ -1,6 +1,7 @@ #ifndef _LINUX_DN_H #define _LINUX_DN_H +#include #include #include -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] uapi: ppp-ioctl.h: pull in ppp_defs.h
This header uses enum NPmode but doesn't include ppp_defs.h. If you try to use this header w/out including the defs header first, it leads to a build failure. So add the explicit include to fix it. Don't know of any packages directly impacted, but noticed while building some ppp code by hand. Signed-off-by: Mike Frysinger --- include/uapi/linux/ppp-ioctl.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h index 2d9a885..63a23a3 100644 --- a/include/uapi/linux/ppp-ioctl.h +++ b/include/uapi/linux/ppp-ioctl.h @@ -12,6 +12,7 @@ #include #include +#include /* * Bit definitions for flags argument to PPPIOCGFLAGS/PPPIOCSFLAGS. -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] uapi: convert u64 to __u64 in exported headers
The u64 type is not defined in any exported kernel headers, so trying to use it will lead to build failures. Signed-off-by: Mike Frysinger --- include/uapi/linux/nfs4.h | 2 +- include/uapi/linux/perf_event.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/nfs4.h b/include/uapi/linux/nfs4.h index 788128e..35f5f4c 100644 --- a/include/uapi/linux/nfs4.h +++ b/include/uapi/linux/nfs4.h @@ -150,7 +150,7 @@ #define NFS4_SECINFO_STYLE4_CURRENT_FH 0 #define NFS4_SECINFO_STYLE4_PARENT 1 -#define NFS4_MAX_UINT64(~(u64)0) +#define NFS4_MAX_UINT64(~(__u64)0) /* An NFS4 sessions server must support at least NFS4_MAX_OPS operations. * If a compound requires more operations, adjust NFS4_MAX_OPS accordingly. diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 959d454..7a3fed5 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -787,7 +787,7 @@ union perf_mem_data_src { #define PERF_MEM_TLB_SHIFT 26 #define PERF_MEM_S(a, s) \ - (((u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) + (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) /* * single taken branch record layout: -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] uapi: convert u64 to __u64 in exported headers
The u64 type is not defined in any exported kernel headers, so trying to use it will lead to build failures. Signed-off-by: Mike Frysinger vap...@gentoo.org --- include/uapi/linux/nfs4.h | 2 +- include/uapi/linux/perf_event.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/nfs4.h b/include/uapi/linux/nfs4.h index 788128e..35f5f4c 100644 --- a/include/uapi/linux/nfs4.h +++ b/include/uapi/linux/nfs4.h @@ -150,7 +150,7 @@ #define NFS4_SECINFO_STYLE4_CURRENT_FH 0 #define NFS4_SECINFO_STYLE4_PARENT 1 -#define NFS4_MAX_UINT64(~(u64)0) +#define NFS4_MAX_UINT64(~(__u64)0) /* An NFS4 sessions server must support at least NFS4_MAX_OPS operations. * If a compound requires more operations, adjust NFS4_MAX_OPS accordingly. diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 959d454..7a3fed5 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -787,7 +787,7 @@ union perf_mem_data_src { #define PERF_MEM_TLB_SHIFT 26 #define PERF_MEM_S(a, s) \ - (((u64)PERF_MEM_##a##_##s) PERF_MEM_##a##_SHIFT) + (((__u64)PERF_MEM_##a##_##s) PERF_MEM_##a##_SHIFT) /* * single taken branch record layout: -- 1.8.4.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] uapi: ppp-ioctl.h: pull in ppp_defs.h
This header uses enum NPmode but doesn't include ppp_defs.h. If you try to use this header w/out including the defs header first, it leads to a build failure. So add the explicit include to fix it. Don't know of any packages directly impacted, but noticed while building some ppp code by hand. Signed-off-by: Mike Frysinger vap...@gentoo.org --- include/uapi/linux/ppp-ioctl.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h index 2d9a885..63a23a3 100644 --- a/include/uapi/linux/ppp-ioctl.h +++ b/include/uapi/linux/ppp-ioctl.h @@ -12,6 +12,7 @@ #include linux/types.h #include linux/compiler.h +#include linux/ppp_defs.h /* * Bit definitions for flags argument to PPPIOCGFLAGS/PPPIOCSFLAGS. -- 1.8.4.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] uapi: dn: pull in ioctl.h header
This header uses _IOW/_IOR defines but doesn't include ioctl.h for it. If you try to use this w/out including ioctl.h yourself, it can fail to build, so add the explicit include. Signed-off-by: Mike Frysinger vap...@gentoo.org --- include/uapi/linux/dn.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/dn.h b/include/uapi/linux/dn.h index 5fbdd3d..4295c74 100644 --- a/include/uapi/linux/dn.h +++ b/include/uapi/linux/dn.h @@ -1,6 +1,7 @@ #ifndef _LINUX_DN_H #define _LINUX_DN_H +#include linux/ioctl.h #include linux/types.h #include linux/if_ether.h -- 1.8.4.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: blackfin + dmaengine: conflicting define/enum "DMA_COMPLETE"
On Saturday 11 January 2014 13:55:15 Marc Kleine-Budde wrote: > On 01/11/2014 07:31 PM, Randy Dunlap wrote: > > On 01/11/2014 10:09 AM, Marc Kleine-Budde wrote: > >> Hello, > >> > >> in current linux-next (and net-next) the compilation of the CAN > >> > >> drivers[1] with ARCH=blackfin fails with: > >>> CC [M] drivers/net/can/c_can/c_can.o > >>> > >>> In file included from linux/include/linux/netdevice.h:38:0, > >>> > >>> from linux/drivers/net/can/c_can/c_can.c:32: > >>> linux/include/linux/dmaengine.h:55:2: error: expected identifier before > >>> numeric constant linux/include/linux/dmaengine.h: In function > >>> 'dma_async_is_complete': linux/include/linux/dmaengine.h:1023:9: > >>> error: 'DMA_IN_PROGRESS' undeclared (first use in this function) > >>> linux/include/linux/dmaengine.h:1023:9: note: each undeclared > >>> identifier is reported only once for each function it appears in > >> > >> There are two locations where DMA_COMPLETE is defined: > >>> arch/blackfin/mach-bf548/include/mach/defBF547.h:602:#define > >>> DMA_COMPLETE 0x8/* DMA Complete */ > >>> arch/blackfin/mach-bf548/include/mach/defBF544.h:622:#define > >>>DMA_COMPLETE 0x8/* DMA Complete */ > >> > >> and > >> > >>> include/linux/dmaengine.h-enum dma_status { > >>> include/linux/dmaengine.h: DMA_COMPLETE, > >>> include/linux/dmaengine.h- DMA_IN_PROGRESS, > >>> include/linux/dmaengine.h- DMA_PAUSED, > >>> include/linux/dmaengine.h- DMA_ERROR, > >>> include/linux/dmaengine.h-}; > >> > >> What's the appropriate fix for the problem? > > > > arch/blackfin/mach-bf548/ needs a less generic name for its macro. > > Mike, is there a in tree user of blacksfin's DMA_COMPLETE? I cannot find > anyone. looks like those are defines for the host port peripheral on the BF54x. typically for peripherals we didn't have proper drivers for (like CAN and UART and SPI and such), we left the defines in the headers. those in turn matched the manual so people coming from other Blackfin environments (and reading the manuals) didn't have to figure out what name the Linux headers used. unfortunately, it leads to cases like this where the names are pretty bad. considering the host peripheral most likely never saw any serious use, it should be fine to delete all the bit defines in those headers related to those registers (i see HOST_{STATUS,CONTROL,TIMEOUT}. -mike signature.asc Description: This is a digitally signed message part.
Re: blackfin + dmaengine: conflicting define/enum DMA_COMPLETE
On Saturday 11 January 2014 13:55:15 Marc Kleine-Budde wrote: On 01/11/2014 07:31 PM, Randy Dunlap wrote: On 01/11/2014 10:09 AM, Marc Kleine-Budde wrote: Hello, in current linux-next (and net-next) the compilation of the CAN drivers[1] with ARCH=blackfin fails with: CC [M] drivers/net/can/c_can/c_can.o In file included from linux/include/linux/netdevice.h:38:0, from linux/drivers/net/can/c_can/c_can.c:32: linux/include/linux/dmaengine.h:55:2: error: expected identifier before numeric constant linux/include/linux/dmaengine.h: In function 'dma_async_is_complete': linux/include/linux/dmaengine.h:1023:9: error: 'DMA_IN_PROGRESS' undeclared (first use in this function) linux/include/linux/dmaengine.h:1023:9: note: each undeclared identifier is reported only once for each function it appears in There are two locations where DMA_COMPLETE is defined: arch/blackfin/mach-bf548/include/mach/defBF547.h:602:#define DMA_COMPLETE 0x8/* DMA Complete */ arch/blackfin/mach-bf548/include/mach/defBF544.h:622:#define DMA_COMPLETE 0x8/* DMA Complete */ and include/linux/dmaengine.h-enum dma_status { include/linux/dmaengine.h: DMA_COMPLETE, include/linux/dmaengine.h- DMA_IN_PROGRESS, include/linux/dmaengine.h- DMA_PAUSED, include/linux/dmaengine.h- DMA_ERROR, include/linux/dmaengine.h-}; What's the appropriate fix for the problem? arch/blackfin/mach-bf548/ needs a less generic name for its macro. Mike, is there a in tree user of blacksfin's DMA_COMPLETE? I cannot find anyone. looks like those are defines for the host port peripheral on the BF54x. typically for peripherals we didn't have proper drivers for (like CAN and UART and SPI and such), we left the defines in the headers. those in turn matched the manual so people coming from other Blackfin environments (and reading the manuals) didn't have to figure out what name the Linux headers used. unfortunately, it leads to cases like this where the names are pretty bad. considering the host peripheral most likely never saw any serious use, it should be fine to delete all the bit defines in those headers related to those registers (i see HOST_{STATUS,CONTROL,TIMEOUT}. -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] blackfin: Ignore generated uImages
On Monday 26 August 2013 07:59:05 Mark Brown wrote: > On Sat, Aug 24, 2013 at 06:01:44PM -0400, Mike Frysinger wrote: > > Acked-by: Mike Frysinger > > Thanks. Do you know who might be likely to apply this - MAINTAINERS > lists you? looks like they haven't updated the MAINTAINERS file. Steven is the maintainer last i heard. -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] blackfin: Ignore generated uImages
On Monday 26 August 2013 07:59:05 Mark Brown wrote: On Sat, Aug 24, 2013 at 06:01:44PM -0400, Mike Frysinger wrote: Acked-by: Mike Frysinger vap...@gentoo.org Thanks. Do you know who might be likely to apply this - MAINTAINERS lists you? looks like they haven't updated the MAINTAINERS file. Steven is the maintainer last i heard. -mike signature.asc Description: This is a digitally signed message part.
Re: [trivial PATCH] treewide: Fix printks with 0x%#
On Thursday 25 July 2013 14:53:25 Joe Perches wrote: > Using 0x%# emits 0x0x. Only one is necessary. sounds like a job for checkpatch.pl :) -mike signature.asc Description: This is a digitally signed message part.
Re: [trivial PATCH] treewide: Fix printks with 0x%#
On Thursday 25 July 2013 14:53:25 Joe Perches wrote: Using 0x%# emits 0x0x. Only one is necessary. sounds like a job for checkpatch.pl :) -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] ARM: Add ".text.unlikely" and ".text.hot" to arm unwind tables
Acked-by: Mike Frysinger -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] blackfin: Ignore generated uImages
Acked-by: Mike Frysinger -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Omnikey Cardman 4000: pull in ioctl.h in user header
On Thursday 22 August 2013 18:55:45 Andrew Morton wrote: > On Fri, 16 Aug 2013 14:07:09 -0400 Mike Frysinger wrote: > > This file uses the ioctl helpers (_IOR/_IOW/etc...), so include ioctl.h > > for the definitions. > > > > ... > > > > --- a/include/uapi/linux/cm4000_cs.h > > +++ b/include/uapi/linux/cm4000_cs.h > > @@ -2,6 +2,7 @@ > > #define _UAPI_CM4000_H_ > > > > #include > > +#include > > > > #defineMAX_ATR 33 > > I'm assuming this fixes a build error under unknown circumstances, or > not, or something else? yes, it's to fix a build error. i don't have any specific source packages that are known to fail though. just doing analysis of the headers by grep/eye. -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Omnikey Cardman 4000: pull in ioctl.h in user header
On Thursday 22 August 2013 18:55:45 Andrew Morton wrote: On Fri, 16 Aug 2013 14:07:09 -0400 Mike Frysinger wrote: This file uses the ioctl helpers (_IOR/_IOW/etc...), so include ioctl.h for the definitions. ... --- a/include/uapi/linux/cm4000_cs.h +++ b/include/uapi/linux/cm4000_cs.h @@ -2,6 +2,7 @@ #define _UAPI_CM4000_H_ #include linux/types.h +#include linux/ioctl.h #defineMAX_ATR 33 I'm assuming this fixes a build error under unknown circumstances, or not, or something else? yes, it's to fix a build error. i don't have any specific source packages that are known to fail though. just doing analysis of the headers by grep/eye. -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] blackfin: Ignore generated uImages
Acked-by: Mike Frysinger vap...@gentoo.org -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] ARM: Add .text.unlikely and .text.hot to arm unwind tables
Acked-by: Mike Frysinger vap...@gentoo.org -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH v2 0/8] turbostat: Build fixes, 32-bit support, and cleanups
Acked-by: Mike Frysinger -mik -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 6/8] turbostat: Factor out common function to open file and exit on failure
On Tue, Aug 20, 2013 at 8:20 PM, Josh Triplett wrote: > +FILE *fopen_or_die(const char *path, const char *mode) > +{ > + FILE *filep = fopen(path, "r"); not a big deal, but would be nice to add the "e" flag -mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 6/8] turbostat: Factor out common function to open file and exit on failure
On Tue, Aug 20, 2013 at 8:20 PM, Josh Triplett j...@joshtriplett.org wrote: +FILE *fopen_or_die(const char *path, const char *mode) +{ + FILE *filep = fopen(path, r); not a big deal, but would be nice to add the e flag -mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/8] turbostat: Build fixes, 32-bit support, and cleanups
Acked-by: Mike Frysinger vap...@chromium.org -mik -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Omnikey Cardman 4000: pull in ioctl.h in user header
This file uses the ioctl helpers (_IOR/_IOW/etc...), so include ioctl.h for the definitions. Signed-off-by: Mike Frysinger --- include/uapi/linux/cm4000_cs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/cm4000_cs.h b/include/uapi/linux/cm4000_cs.h index bc51f77..1217f75 100644 --- a/include/uapi/linux/cm4000_cs.h +++ b/include/uapi/linux/cm4000_cs.h @@ -2,6 +2,7 @@ #define _UAPI_CM4000_H_ #include +#include #defineMAX_ATR 33 -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Omnikey Cardman 4000: pull in ioctl.h in user header
This file uses the ioctl helpers (_IOR/_IOW/etc...), so include ioctl.h for the definitions. Signed-off-by: Mike Frysinger vap...@gentoo.org --- include/uapi/linux/cm4000_cs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/cm4000_cs.h b/include/uapi/linux/cm4000_cs.h index bc51f77..1217f75 100644 --- a/include/uapi/linux/cm4000_cs.h +++ b/include/uapi/linux/cm4000_cs.h @@ -2,6 +2,7 @@ #define _UAPI_CM4000_H_ #include linux/types.h +#include linux/ioctl.h #defineMAX_ATR 33 -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:perf/urgent] perf tools: Fix -ldw/ -lelf link test when static linking
Commit-ID: d14c496588733ec1b586ec068932c1db228dd770 Gitweb: http://git.kernel.org/tip/d14c496588733ec1b586ec068932c1db228dd770 Author: Mike Frysinger AuthorDate: Thu, 9 May 2013 00:17:44 -0400 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 10 Jul 2013 13:41:12 -0300 perf tools: Fix -ldw/-lelf link test when static linking Since libelf sometimes uses libpthread, we have to list that after -lelf when someone tries to build statically. Else things go boom: Makefile:479: *** No libelf.h/libelf found, please install \ libelf-dev/elfutils-libelf-devel. Stop. Similarly, the -ldw test fails as it often uses -lz: Makefile:462: No libdw.h found or old libdw.h found or elfutils is older \ than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev And if we add debugging to try-cc, we see: + echo '#include int main(void) { Dwarf *dbg = dwarf_begin(0, DWARF_C_READ); return (long)dbg; }' + i686-pc-linux-gnu-gcc -x c - -O2 -pipe -march=atom -mtune=atom -mfpmath=sse -g \ -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE \ -ldw -lelf -static -lpthread -lrt -lelf -lm -o .24368 /usr/lib/libdw.a(dwarf_begin_elf.o):function check_section.isra.1: error: undefined reference to 'inflateInit_' /usr/lib/libdw.a(dwarf_begin_elf.o):function check_section.isra.1: error: undefined reference to 'inflate' /usr/lib/libdw.a(dwarf_begin_elf.o):function check_section.isra.1: error: undefined reference to 'inflateReset' /usr/lib/libdw.a(dwarf_begin_elf.o):function check_section.isra.1: error: undefined reference to 'inflateEnd' + echo '#include int main(void) { Elf *elf = elf_begin(0, ELF_C_READ, 0); return (long)elf; }' + i686-pc-linux-gnu-gcc -x c - -O2 -pipe -march=atom -mtune=atom -mfpmath=sse -g \ -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE \ -static -lpthread -lrt -lelf -lm -o .19216 /usr/lib/libelf.a(elf_begin.o):function file_read_elf: error: undefined reference to 'pthread_rwlock_init' /usr/lib/libelf.a(elf_begin.o):function __libelf_read_mmaped_file: error: undefined reference to 'pthread_rwlock_init' /usr/lib/libelf.a(elf_begin.o):function __libelf_read_mmaped_file: error: undefined reference to 'pthread_rwlock_init' /usr/lib/libelf.a(elf_begin.o):function read_file: error: undefined reference to 'pthread_rwlock_init' /usr/lib/libelf.a(elf_begin.o):function lock_dup_elf.8072: error: undefined reference to 'pthread_rwlock_unlock' /usr/lib/libelf.a(elf_begin.o):function lock_dup_elf.8072: error: undefined reference to 'pthread_rwlock_wrlock' /usr/lib/libelf.a(elf_begin.o):function elf_begin: error: undefined reference to 'pthread_rwlock_rdlock' /usr/lib/libelf.a(elf_begin.o):function elf_begin: error: undefined reference to 'pthread_rwlock_unlock' Signed-off-by: Mike Frysinger Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1368073064-18276-1-git-send-email-vap...@gentoo.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/lib/lk/Makefile | 2 +- tools/perf/config/Makefile | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/lib/lk/Makefile b/tools/lib/lk/Makefile index f0ecc0a..280dd82 100644 --- a/tools/lib/lk/Makefile +++ b/tools/lib/lk/Makefile @@ -29,7 +29,7 @@ LIB_OBJS += $(OUTPUT)debugfs.o LIBFILE = liblk.a CFLAGS = -ggdb3 -Wall -Wextra -std=gnu99 -Werror -O6 -D_FORTIFY_SOURCE=2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -fPIC -EXTLIBS = -lpthread -lrt -lelf -lm +EXTLIBS = -lelf -lpthread -lrt -lm ALL_CFLAGS = $(CFLAGS) $(BASIC_CFLAGS) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 ALL_LDFLAGS = $(LDFLAGS) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index f446895..b5d9238 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -85,7 +85,7 @@ CFLAGS += -Wall CFLAGS += -Wextra CFLAGS += -std=gnu99 -EXTLIBS = -lpthread -lrt -lelf -lm +EXTLIBS = -lelf -lpthread -lrt -lm ifeq ($(call try-cc,$(SOURCE_HELLO),$(CFLAGS) -Werror -fstack-protector-all,-fstack-protector-all),y) CFLAGS += -fstack-protector-all @@ -165,7 +165,7 @@ else LIBDW_LDFLAGS := -L$(LIBDW_DIR)/lib endif - FLAGS_DWARF=$(CFLAGS) $(LIBDW_CFLAGS) -ldw -lelf $(LIBDW_LDFLAGS) $(LDFLAGS) $(EXTLIBS) + FLAGS_DWARF=$(CFLAGS) $(LIBDW_CFLAGS) -ldw -lz -lelf $(LIBDW_LDFLAGS) $(LDFLAGS) $(EXTLIBS) ifneq ($(call try-cc,$(SOURCE_DWARF),$(FLAGS_DWARF),libdw),y) msg := $(warning No libdw.h found or old libdw.h found or elfutils is older than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev); NO_DWARF := 1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:perf/urgent] perf tools: Fix -ldw/ -lelf link test when static linking
Commit-ID: d14c496588733ec1b586ec068932c1db228dd770 Gitweb: http://git.kernel.org/tip/d14c496588733ec1b586ec068932c1db228dd770 Author: Mike Frysinger vap...@chromium.org AuthorDate: Thu, 9 May 2013 00:17:44 -0400 Committer: Arnaldo Carvalho de Melo a...@redhat.com CommitDate: Wed, 10 Jul 2013 13:41:12 -0300 perf tools: Fix -ldw/-lelf link test when static linking Since libelf sometimes uses libpthread, we have to list that after -lelf when someone tries to build statically. Else things go boom: Makefile:479: *** No libelf.h/libelf found, please install \ libelf-dev/elfutils-libelf-devel. Stop. Similarly, the -ldw test fails as it often uses -lz: Makefile:462: No libdw.h found or old libdw.h found or elfutils is older \ than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev And if we add debugging to try-cc, we see: + echo '#include dwarf.h int main(void) { Dwarf *dbg = dwarf_begin(0, DWARF_C_READ); return (long)dbg; }' + i686-pc-linux-gnu-gcc -x c - -O2 -pipe -march=atom -mtune=atom -mfpmath=sse -g \ -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE \ -ldw -lelf -static -lpthread -lrt -lelf -lm -o .24368 /usr/lib/libdw.a(dwarf_begin_elf.o):function check_section.isra.1: error: undefined reference to 'inflateInit_' /usr/lib/libdw.a(dwarf_begin_elf.o):function check_section.isra.1: error: undefined reference to 'inflate' /usr/lib/libdw.a(dwarf_begin_elf.o):function check_section.isra.1: error: undefined reference to 'inflateReset' /usr/lib/libdw.a(dwarf_begin_elf.o):function check_section.isra.1: error: undefined reference to 'inflateEnd' + echo '#include libelf.h int main(void) { Elf *elf = elf_begin(0, ELF_C_READ, 0); return (long)elf; }' + i686-pc-linux-gnu-gcc -x c - -O2 -pipe -march=atom -mtune=atom -mfpmath=sse -g \ -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE \ -static -lpthread -lrt -lelf -lm -o .19216 /usr/lib/libelf.a(elf_begin.o):function file_read_elf: error: undefined reference to 'pthread_rwlock_init' /usr/lib/libelf.a(elf_begin.o):function __libelf_read_mmaped_file: error: undefined reference to 'pthread_rwlock_init' /usr/lib/libelf.a(elf_begin.o):function __libelf_read_mmaped_file: error: undefined reference to 'pthread_rwlock_init' /usr/lib/libelf.a(elf_begin.o):function read_file: error: undefined reference to 'pthread_rwlock_init' /usr/lib/libelf.a(elf_begin.o):function lock_dup_elf.8072: error: undefined reference to 'pthread_rwlock_unlock' /usr/lib/libelf.a(elf_begin.o):function lock_dup_elf.8072: error: undefined reference to 'pthread_rwlock_wrlock' /usr/lib/libelf.a(elf_begin.o):function elf_begin: error: undefined reference to 'pthread_rwlock_rdlock' /usr/lib/libelf.a(elf_begin.o):function elf_begin: error: undefined reference to 'pthread_rwlock_unlock' Signed-off-by: Mike Frysinger vap...@gentoo.org Cc: Ingo Molnar mi...@redhat.com Cc: Paul Mackerras pau...@samba.org Cc: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/r/1368073064-18276-1-git-send-email-vap...@gentoo.org Signed-off-by: Arnaldo Carvalho de Melo a...@redhat.com --- tools/lib/lk/Makefile | 2 +- tools/perf/config/Makefile | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/lib/lk/Makefile b/tools/lib/lk/Makefile index f0ecc0a..280dd82 100644 --- a/tools/lib/lk/Makefile +++ b/tools/lib/lk/Makefile @@ -29,7 +29,7 @@ LIB_OBJS += $(OUTPUT)debugfs.o LIBFILE = liblk.a CFLAGS = -ggdb3 -Wall -Wextra -std=gnu99 -Werror -O6 -D_FORTIFY_SOURCE=2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -fPIC -EXTLIBS = -lpthread -lrt -lelf -lm +EXTLIBS = -lelf -lpthread -lrt -lm ALL_CFLAGS = $(CFLAGS) $(BASIC_CFLAGS) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 ALL_LDFLAGS = $(LDFLAGS) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index f446895..b5d9238 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -85,7 +85,7 @@ CFLAGS += -Wall CFLAGS += -Wextra CFLAGS += -std=gnu99 -EXTLIBS = -lpthread -lrt -lelf -lm +EXTLIBS = -lelf -lpthread -lrt -lm ifeq ($(call try-cc,$(SOURCE_HELLO),$(CFLAGS) -Werror -fstack-protector-all,-fstack-protector-all),y) CFLAGS += -fstack-protector-all @@ -165,7 +165,7 @@ else LIBDW_LDFLAGS := -L$(LIBDW_DIR)/lib endif - FLAGS_DWARF=$(CFLAGS) $(LIBDW_CFLAGS) -ldw -lelf $(LIBDW_LDFLAGS) $(LDFLAGS) $(EXTLIBS) + FLAGS_DWARF=$(CFLAGS) $(LIBDW_CFLAGS) -ldw -lz -lelf $(LIBDW_LDFLAGS) $(LDFLAGS) $(EXTLIBS) ifneq ($(call try-cc,$(SOURCE_DWARF),$(FLAGS_DWARF),libdw),y) msg := $(warning No libdw.h found or old libdw.h found or elfutils is older than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev); NO_DWARF := 1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: Avoid extra calls to the C compiler
Acked-by: Mike Frysinger -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] ARM: Avoid extra calls to the C compiler
Acked-by: Mike Frysinger vap...@gentoo.org -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH 11/32] blackfin: delete __cpuinit usage from all blackfin files
Acked-by: Mike Frysinger -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH 11/32] blackfin: delete __cpuinit usage from all blackfin files
Acked-by: Mike Frysinger vap...@gentoo.org -mike signature.asc Description: This is a digitally signed message part.
Re: [uclinux-dist-devel] [PATCH 01/11] cpufreq: blackfin: enable driver for CONFIG_BFIN_CPU_FREQ
On Wednesday 12 June 2013 04:15:08 Viresh Kumar wrote: > By mistake blackfin's cpufreq driver is enabled when CONFIG_BLACKFIN was > present, whereas it should have been enabled only when CONFIG_BFIN_CPU_FREQ > is present. Acked-by: Mike Frysinger -mike signature.asc Description: This is a digitally signed message part.
Re: [uclinux-dist-devel] [PATCH 01/11] cpufreq: blackfin: enable driver for CONFIG_BFIN_CPU_FREQ
On Wednesday 12 June 2013 04:15:08 Viresh Kumar wrote: By mistake blackfin's cpufreq driver is enabled when CONFIG_BLACKFIN was present, whereas it should have been enabled only when CONFIG_BFIN_CPU_FREQ is present. Acked-by: Mike Frysinger vap...@gentoo.org -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] arch: blackfin: kernel: sprintf(), need avoid NUL for '%s'
i think you mean NULL instead of NUL that said, the kernel is smart enough to replace NULL with "(null)", so i don't see much point in this patch -mike On Wed, May 29, 2013 at 5:43 AM, Chen Gang wrote: > > When it is kernel symbol, the 'modname' will be NUL, and the 'symname' > contents the valid name. > > So for sprintf(), need avoid NUL for '%s'. > > > Signed-off-by: Chen Gang > --- > arch/blackfin/kernel/trace.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/arch/blackfin/kernel/trace.c b/arch/blackfin/kernel/trace.c > index c36efa0..11f98bb 100644 > --- a/arch/blackfin/kernel/trace.c > +++ b/arch/blackfin/kernel/trace.c > @@ -51,7 +51,7 @@ void decode_address(char *buf, unsigned long address) > if (!modname) > modname = delim = ""; > sprintf(buf, "{ %s%s%s%s + 0x%lx }", > - delim, modname, delim, symname, > + delim, modname ? : "kernel", delim, symname, > (unsigned long)offset); > return; > } > -- > 1.7.7.6 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] arch: blackfin: kernel: sprintf(), need avoid NUL for '%s'
i think you mean NULL instead of NUL that said, the kernel is smart enough to replace NULL with (null), so i don't see much point in this patch -mike On Wed, May 29, 2013 at 5:43 AM, Chen Gang gang.c...@asianux.com wrote: When it is kernel symbol, the 'modname' will be NUL, and the 'symname' contents the valid name. So for sprintf(), need avoid NUL for '%s'. Signed-off-by: Chen Gang gang.c...@asianux.com --- arch/blackfin/kernel/trace.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/blackfin/kernel/trace.c b/arch/blackfin/kernel/trace.c index c36efa0..11f98bb 100644 --- a/arch/blackfin/kernel/trace.c +++ b/arch/blackfin/kernel/trace.c @@ -51,7 +51,7 @@ void decode_address(char *buf, unsigned long address) if (!modname) modname = delim = ; sprintf(buf, { %s%s%s%s + 0x%lx }, - delim, modname, delim, symname, + delim, modname ? : kernel, delim, symname, (unsigned long)offset); return; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/