[PATCH] cxgb3: avoid needless buffer copy for firmware

2015-06-16 Thread Kees Cook
There's no reason to perform a buffer copy for the firmware name. This
also avoids a (currently impossible with current callers) NULL dereference
if there was no matching firmware.

Signed-off-by: Kees Cook keesc...@chromium.org
---
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c 
b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
index b96e4bfcac41..8f7aa53a4c4b 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
@@ -1025,19 +1025,19 @@ int t3_get_edc_fw(struct cphy *phy, int edc_idx, int 
size)
 {
struct adapter *adapter = phy-adapter;
const struct firmware *fw;
-   char buf[64];
+   const char *fw_name;
u32 csum;
const __be32 *p;
u16 *cache = phy-phy_cache;
-   int i, ret;
-
-   snprintf(buf, sizeof(buf), get_edc_fw_name(edc_idx));
+   int i, ret = -EINVAL;
 
-   ret = request_firmware(fw, buf, adapter-pdev-dev);
+   fw_name = get_edc_fw_name(edc_idx);
+   if (fw_name)
+   ret = request_firmware(fw, fw_name, adapter-pdev-dev);
if (ret  0) {
dev_err(adapter-pdev-dev,
could not upgrade firmware: unable to load %s\n,
-   buf);
+   fw_name);
return ret;
}
 
-- 
1.9.1


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-07 Thread Kees Cook
On Wed, Oct 7, 2015 at 3:07 PM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 10/07/2015 11:20 PM, Alexei Starovoitov wrote:
>>
>> On 10/6/15 5:45 AM, Daniel Borkmann wrote:
>>>
>>> Should instead something similar be adapted on bpf(2) as well? Or, would
>>> that even be more painful for application developers shipping their stuff
>>> through distros in the end (where they might then decide to just setup
>>> everything BPF-related and then drop privs)?
>>
>>
>> I think loading as root and then dropping privs won't work in many
>> cases, since apps still need to access maps even after dropping privs
>> and today it's not possible, since cap_sys_admin is tested for every
>> bpf syscall.
>
>
> Yep, maps-only would then need to be made accessible in some way.
>
>>> I'm also wondering with regards to seccomp, which could adapt to eBPF at
>>> some point and be used by unprivileged programs. Perhaps then, a single
>>> paranoia alike setting might not suit to all eBPF subsystem users. Any
>>> ideas?
>>
>>
>> There is no such paranoid sysctl for cBPF, so there is no reason to
>> add one for eBPF other than fear.
>> Adding multiple sysctl knobs for seccomp, socket, tracing is only
>> reflection of even higher fear.
>> What sysadmins suppose to do with such sysctl when kernel is kinda
>> saying 'may be something unsafe here you're on your own' ?
>> Also the presence of this sysctl_bpf_enable_unprivileged or any other
>> one doesn't help with CVEs. Any bug with security implications will
>> be a CVE regardless, so I think the better course of action is to
>> avoid introducing this sysctl.
>
>
> Yes, I agree with you that there would be a CVE regardless. I still
> like the option of configurable access, not a big fan of the sysctl
> either. Thinking out loudly, what about a Kconfig option? We started
> out like this on bpf(2) itself (initially under expert settings, now
> afaik not anymore), and depending on usage scenarios, a requirement
> could be to have immutable cap_sys_admin-only, for other use-cases a
> requirement on the kernel might instead be to have unprivileged users
> as well.

It'd be nice to have it just be a Kconfig, but this shoots
distro-users in the foot if a distro decides to include unpriv bpf and
the user doesn't want it. I think it's probably a good idea to keep
the sysctl.

-Kees

>
>> We've discussed adding something like CAP_BPF to control it,
>> but then again, do we want this because of fear of bugs or because
>> it's actually needed. I think the design of all CAP_* is to give
>> unprivileged users permissions to do something beyond normal that
>> can potentially be harmful for other users or the whole system.
>> In this case it's not the case. One user can load eBPF programs
>> and maps up to its MEMLOCK limit and they cannot interfere with
>> other users or affect the host, so CAP_BPF is not necessary either.
>
>
> Thanks,
> Daniel



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 3/3] bpf: add unprivileged bpf tests

2015-10-08 Thread Kees Cook
> +   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +   BPF_MOV64_IMM(BPF_REG_0, 0),
> +   BPF_EXIT_INSN(),
> +   },
> +   .errstr_unpriv = "R2 pointer arithmetic",
> +   .result_unpriv = REJECT,
> +   .result = ACCEPT,
> +   },
>  };
>
>  static int probe_filter_length(struct bpf_insn *fp)
> @@ -896,13 +1195,24 @@ static int probe_filter_length(struct bpf_insn *fp)
>
>  static int create_map(void)
>  {
> -   long long key, value = 0;
> int map_fd;
>
> -   map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), 
> sizeof(value), 1024);
> -   if (map_fd < 0) {
> +   map_fd = bpf_create_map(BPF_MAP_TYPE_HASH,
> +   sizeof(long long), sizeof(long long), 1024);
> +   if (map_fd < 0)
> printf("failed to create map '%s'\n", strerror(errno));
> -   }
> +
> +   return map_fd;
> +}
> +
> +static int create_prog_array(void)
> +{
> +   int map_fd;
> +
> +   map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY,
> +   sizeof(int), sizeof(int), 4);
> +   if (map_fd < 0)
> +   printf("failed to create prog_array '%s'\n", strerror(errno));
>
> return map_fd;
>  }
> @@ -910,13 +1220,17 @@ static int create_map(void)
>  static int test(void)
>  {
> int prog_fd, i, pass_cnt = 0, err_cnt = 0;
> +   bool unpriv = geteuid() != 0;
>
> for (i = 0; i < ARRAY_SIZE(tests); i++) {
> struct bpf_insn *prog = tests[i].insns;
> int prog_type = tests[i].prog_type;
> int prog_len = probe_filter_length(prog);
> int *fixup = tests[i].fixup;
> -   int map_fd = -1;
> +   int *prog_array_fixup = tests[i].prog_array_fixup;
> +   int expected_result;
> +   const char *expected_errstr;
> +   int map_fd = -1, prog_array_fd = -1;
>
> if (*fixup) {
> map_fd = create_map();
> @@ -926,13 +1240,31 @@ static int test(void)
> fixup++;
> } while (*fixup);
> }
> +   if (*prog_array_fixup) {
> +   prog_array_fd = create_prog_array();
> +
> +   do {
> +   prog[*prog_array_fixup].imm = prog_array_fd;
> +   prog_array_fixup++;
> +   } while (*prog_array_fixup);
> +   }
> printf("#%d %s ", i, tests[i].descr);
>
> prog_fd = bpf_prog_load(prog_type ?: 
> BPF_PROG_TYPE_SOCKET_FILTER,
> prog, prog_len * sizeof(struct 
> bpf_insn),
> "GPL", 0);
>
> -   if (tests[i].result == ACCEPT) {
> +   if (unpriv && tests[i].result_unpriv != UNDEF)
> +   expected_result = tests[i].result_unpriv;
> +   else
> +   expected_result = tests[i].result;
> +
> +   if (unpriv && tests[i].errstr_unpriv)
> +   expected_errstr = tests[i].errstr_unpriv;
> +   else
> +   expected_errstr = tests[i].errstr;
> +
> +   if (expected_result == ACCEPT) {
> if (prog_fd < 0) {
> printf("FAIL\nfailed to load prog '%s'\n",
>strerror(errno));
> @@ -947,7 +1279,7 @@ static int test(void)
> err_cnt++;
> goto fail;
> }
> -   if (strstr(bpf_log_buf, tests[i].errstr) == 0) {
> +   if (strstr(bpf_log_buf, expected_errstr) == 0) {
> printf("FAIL\nunexpected error message: %s",
>bpf_log_buf);
> err_cnt++;
> @@ -960,6 +1292,8 @@ static int test(void)
>  fail:
> if (map_fd >= 0)
> close(map_fd);
> +   if (prog_array_fd >= 0)
> +   close(prog_array_fd);
> close(prog_fd);
>
> }
> @@ -970,5 +1304,8 @@ fail:
>
>  int main(void)
>  {
> +   struct rlimit r = {1 << 20, 1 << 20};
> +
> +   setrlimit(RLIMIT_MEMLOCK, );
> return test();
>  }
> --
> 1.7.9.5
>



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-08 Thread Kees Cook
On Wed, Oct 7, 2015 at 4:49 PM, Alexei Starovoitov <a...@plumgrid.com> wrote:
> On 10/7/15 3:22 PM, Kees Cook wrote:
>>>
>>> Yes, I agree with you that there would be a CVE regardless. I still
>>> >like the option of configurable access, not a big fan of the sysctl
>>> >either. Thinking out loudly, what about a Kconfig option? We started
>>> >out like this on bpf(2) itself (initially under expert settings, now
>>> >afaik not anymore), and depending on usage scenarios, a requirement
>>> >could be to have immutable cap_sys_admin-only, for other use-cases a
>>> >requirement on the kernel might instead be to have unprivileged users
>>> >as well.
>>
>> It'd be nice to have it just be a Kconfig, but this shoots
>> distro-users in the foot if a distro decides to include unpriv bpf and
>> the user doesn't want it. I think it's probably a good idea to keep
>> the sysctl.
>
>
> I don't like introducing Kconfig for no clear reason. It only adds
> to the testing matrix and makes it harder to hack around.
> Paranoid distros can disable bpf via single config already,
> there is no reason to go more fine grained here.
> Unpriv checks add minimal amount of code, so even for tinification
> purpose there is no need to chop of few bytes. tiny kernels would
> disable bpf all together.
>
> As far as sysctl we can look at two with similar purpose:
> sysctl_perf_event_paranoid and modules_disabled.
> First one is indeed multi level, but not because of the fear of bugs,
> but because of real security implications. Like raw events on
> hyperthreaded cpu or uncore events can extract data from other
> user processes. So it controls these extra privileges.
> For bpf there are no hw implications to deal with.
> If we make seccomp+bpf in the future it shouldn't need another knob
> or extra bit. There are no extra privileges to grant, so not needed.
>
> modules_disabled is off by default and can be toggled on once.
> I think for paranoid distro users that "don't want bpf" that is
> the better model.
> So I'm thinking to do sysctl_unprivileged_bpf_disabled that will be
> 0=off by default (meaning that users can load unpriv socket filter
> programs and seccomp in the future) and that can be switched
> to 1=on once and stay that way until reboot.
> I think that's the best balance that avoids adding checks to all
> apps that want to use bpf and admins can still act on it.
> From app point of view it's no different than bpf syscall
> was not compiled in. So single feature test for bpf syscall will
> be enough.

I think this would be great. :)

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 1/3] bpf: enable non-root eBPF programs

2015-10-08 Thread Kees Cook
On Wed, Oct 7, 2015 at 10:23 PM, Alexei Starovoitov <a...@plumgrid.com> wrote:
> In order to let unprivileged users load and execute eBPF programs
> teach verifier to prevent pointer leaks.
> Verifier will prevent
> - any arithmetic on pointers
>   (except R10+Imm which is used to compute stack addresses)
> - comparison of pointers
>   (except if (map_value_ptr == 0) ... )
> - passing pointers to helper functions
> - indirectly passing pointers in stack to helper functions
> - returning pointer from bpf program
> - storing pointers into ctx or maps
>
> Spill/fill of pointers into stack is allowed, but mangling
> of pointers stored in the stack or reading them byte by byte is not.
>
> Within bpf programs the pointers do exist, since programs need to
> be able to access maps, pass skb pointer to LD_ABS insns, etc
> but programs cannot pass such pointer values to the outside
> or obfuscate them.
>
> Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
> so that socket filters (tcpdump), af_packet (quic acceleration)
> and future kcm can use it.
> tracing and tc cls/act program types still require root permissions,
> since tracing actually needs to be able to see all kernel pointers
> and tc is for root only.
>
> For example, the following unprivileged socket filter program is allowed:
> int bpf_prog1(struct __sk_buff *skb)
> {
>   u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
>   u64 *value = bpf_map_lookup_elem(_map, );
>
>   if (value)
> *value += skb->len;
>   return 0;
> }
>
> but the following program is not:
> int bpf_prog1(struct __sk_buff *skb)
> {
>   u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
>   u64 *value = bpf_map_lookup_elem(_map, );
>
>   if (value)
> *value += (u64) skb;
>   return 0;
> }
> since it would leak the kernel address into the map.
>
> Unprivileged socket filter bpf programs have access to the
> following helper functions:
> - map lookup/update/delete (but they cannot store kernel pointers into them)
> - get_random (it's already exposed to unprivileged user space)
> - get_smp_processor_id
> - tail_call into another socket filter program
> - ktime_get_ns
>
> The feature is controlled by sysctl kernel.unprivileged_bpf_disabled.
> This toggle defaults to off (0), but can be set true (1).  Once true,
> bpf programs and maps cannot be accessed from unprivileged process,
> and the toggle cannot be set back to false.
>
> Signed-off-by: Alexei Starovoitov <a...@plumgrid.com>

Reviewed-by: Kees Cook <keesc...@chromium.org>

Thanks for making this safer! :)

-Kees

> ---
> v1->v2:
> - sysctl_unprivileged_bpf_disabled
> - drop bpf_trace_printk
> - split tests into separate patch to ease review
> ---
>  include/linux/bpf.h   |2 +
>  kernel/bpf/syscall.c  |   11 ++---
>  kernel/bpf/verifier.c |  106 
> -
>  kernel/sysctl.c   |   13 ++
>  net/core/filter.c |3 +-
>  5 files changed, 120 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 19b8a2081f88..e472b06df138 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -167,6 +167,8 @@ void bpf_prog_put_rcu(struct bpf_prog *prog);
>  struct bpf_map *bpf_map_get(struct fd f);
>  void bpf_map_put(struct bpf_map *map);
>
> +extern int sysctl_unprivileged_bpf_disabled;
> +
>  /* verify correctness of eBPF program */
>  int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
>  #else
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 5f35f420c12f..9f824b0f0f5f 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -18,6 +18,8 @@
>  #include 
>  #include 
>
> +int sysctl_unprivileged_bpf_disabled __read_mostly;
> +
>  static LIST_HEAD(bpf_map_types);
>
>  static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
> @@ -542,6 +544,9 @@ static int bpf_prog_load(union bpf_attr *attr)
> attr->kern_version != LINUX_VERSION_CODE)
> return -EINVAL;
>
> +   if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN))
> +   return -EPERM;
> +
> /* plain bpf_prog allocation */
> prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
> if (!prog)
> @@ -597,11 +602,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, 
> uattr, unsigned int, siz
> union bpf_attr attr = {};
> int err;
>
> -   /* the syscall is limited to root temporarily. This restriction will 
> be
> -* lifted when security audit is clean. 

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Kees Cook
   BPF_MOV64_IMM(BPF_REG_0, 0),
> +   BPF_EXIT_INSN(),
> +   },
> +   .errstr_unpriv = "attempt to corrupt spilled",
> +   .result_unpriv = REJECT,
> +   .result = ACCEPT,
> +   },
> +   {
> +   "unpriv: read pointer from stack in small chunks",
> +   .insns = {
> +   BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
> +   BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -8),
> +   BPF_MOV64_IMM(BPF_REG_0, 0),
> +   BPF_EXIT_INSN(),
> +   },
> +   .errstr = "invalid size",
> +   .result = REJECT,
> +   },
> +   {
> +   "unpriv: write pointer into ctx",
> +   .insns = {
> +   BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0),
> +   BPF_MOV64_IMM(BPF_REG_0, 0),
> +   BPF_EXIT_INSN(),
> +   },
> +   .errstr_unpriv = "R1 leaks addr",
> +   .result_unpriv = REJECT,
> +   .errstr = "invalid bpf_context access",
> +   .result = REJECT,
> +   },
> +   {
> +   "unpriv: write pointer into map elem value",
> +   .insns = {
> +   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +   BPF_LD_MAP_FD(BPF_REG_1, 0),
> +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 
> BPF_FUNC_map_lookup_elem),
> +   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
> +   BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
> +   BPF_EXIT_INSN(),
> +   },
> +   .fixup = {3},
> +   .errstr_unpriv = "R0 leaks addr",
> +   .result_unpriv = REJECT,
> +   .result = ACCEPT,
> +   },
> +   {
> +   "unpriv: partial copy of pointer",
> +   .insns = {
> +   BPF_MOV32_REG(BPF_REG_1, BPF_REG_10),
> +   BPF_MOV64_IMM(BPF_REG_0, 0),
> +   BPF_EXIT_INSN(),
> +   },
> +   .errstr_unpriv = "R10 partial copy",
> +   .result_unpriv = REJECT,
> +   .result = ACCEPT,
> +   },
>  };
>
>  static int probe_filter_length(struct bpf_insn *fp)
> @@ -910,12 +1130,15 @@ static int create_map(void)
>  static int test(void)
>  {
> int prog_fd, i, pass_cnt = 0, err_cnt = 0;
> +   bool unpriv = geteuid() != 0;
>
> for (i = 0; i < ARRAY_SIZE(tests); i++) {
> struct bpf_insn *prog = tests[i].insns;
> int prog_type = tests[i].prog_type;
> int prog_len = probe_filter_length(prog);
> int *fixup = tests[i].fixup;
> +   int expected_result;
> +   const char *expected_errstr;
> int map_fd = -1;
>
> if (*fixup) {
> @@ -932,7 +1155,17 @@ static int test(void)
> prog, prog_len * sizeof(struct 
> bpf_insn),
> "GPL", 0);
>
> -   if (tests[i].result == ACCEPT) {
> +   if (unpriv && tests[i].result_unpriv != UNDEF)
> +   expected_result = tests[i].result_unpriv;
> +   else
> +   expected_result = tests[i].result;
> +
> +   if (unpriv && tests[i].errstr_unpriv)
> +   expected_errstr = tests[i].errstr_unpriv;
> +   else
> +   expected_errstr = tests[i].errstr;
> +
> +   if (expected_result == ACCEPT) {
> if (prog_fd < 0) {
> printf("FAIL\nfailed to load prog '%s'\n",
>strerror(errno));
> @@ -947,7 +1180,7 @@ static int test(void)
> err_cnt++;
> goto fail;
> }
> -   if (strstr(bpf_log_buf, tests[i].errstr) == 0) {
> +   if (strstr(bpf_log_buf, expected_errstr) == 0) {
> printf("FAIL\nunexpected error message: %s",
>bpf_log_buf);
> err_cnt++;
> --
> 1.7.9.5
>



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program

2015-09-04 Thread Kees Cook
On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.ander...@canonical.com> wrote:
> This commit adds a way to dump eBPF programs. The initial implementation
> doesn't support maps, and therefore only allows dumping seccomp ebpf
> programs which themselves don't currently support maps.
>
> We export the GPL bit as well as a unique ID for the program so that

This unique ID appears to be the heap address for the prog. That's a
huge leak, and should not be done. We don't want to introduce new
kernel address leaks while we're trying to fix the remaining ones.
Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE could
be used, for example.

-Kees

> userspace can detect when two seccomp filters were inherited from each
> other and clone the filter tree accordingly.
>
> Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
> CC: Kees Cook <keesc...@chromium.org>
> CC: Will Drewry <w...@chromium.org>
> CC: Oleg Nesterov <o...@redhat.com>
> CC: Andy Lutomirski <l...@amacapital.net>
> CC: Pavel Emelyanov <xe...@parallels.com>
> CC: Serge E. Hallyn <serge.hal...@ubuntu.com>
> CC: Alexei Starovoitov <a...@kernel.org>
> CC: Daniel Borkmann <dan...@iogearbox.net>
> ---
>  include/uapi/linux/bpf.h | 15 +++
>  kernel/bpf/syscall.c | 44 
>  2 files changed, 59 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 79b825a..c5d8dc2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -107,6 +107,13 @@ enum bpf_cmd {
>  * returns fd or negative error
>  */
> BPF_PROG_LOAD,
> +
> +   /* dump an existing bpf
> +* err = bpf(BPF_PROG_DUMP, union bpf_attr *attr, u32 size)
> +* Using attr->prog_fd, attr->dump_insn_cnt, attr->dump_insns
> +* returns zero or negative error
> +*/
> +   BPF_PROG_DUMP,
>  };
>
>  enum bpf_map_type {
> @@ -160,6 +167,14 @@ union bpf_attr {
> __aligned_u64   log_buf;/* user supplied buffer */
> __u32   kern_version;   /* checked when 
> prog_type=kprobe */
> };
> +
> +   struct { /* anonymous struct used by BPF_PROG_DUMP command */
> +   __u32   prog_fd;
> +   __u32   dump_insn_cnt;
> +   __aligned_u64   dump_insns; /* user supplied buffer */
> +   __u8gpl_compatible;
> +   __u64   prog_id;/* unique id for this prog */
> +   };
>  } __attribute__((aligned(8)));
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index a1b14d1..ee580d0 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -586,6 +586,47 @@ free_prog:
> return err;
>  }
>
> +static int bpf_prog_dump(union bpf_attr *attr, union __user bpf_attr *uattr)
> +{
> +   int ufd = attr->prog_fd;
> +   struct fd f = fdget(ufd);
> +   struct bpf_prog *prog;
> +   int ret = -EINVAL;
> +
> +   prog = get_prog(f);
> +   if (IS_ERR(prog))
> +   return PTR_ERR(prog);
> +
> +   /* For now, let's refuse to dump anything that isn't a seccomp 
> program.
> +* Other program types have support for maps, which our current dump
> +* code doesn't support.
> +*/
> +   if (prog->type != BPF_PROG_TYPE_SECCOMP)
> +   goto out;
> +
> +   ret = -EFAULT;
> +   if (put_user(prog->len, >dump_insn_cnt))
> +   goto out;
> +
> +   if (put_user((u8) prog->gpl_compatible, >gpl_compatible))
> +   goto out;
> +
> +   if (put_user((u64) prog, >prog_id))
> +   goto out;
> +
> +   if (attr->dump_insns) {
> +   u32 len = prog->len * sizeof(struct bpf_insn);
> +
> +   if (copy_to_user(u64_to_ptr(attr->dump_insns),
> +prog->insns, len) != 0)
> +   goto out;
> +   }
> +
> +   ret = 0;
> +out:
> +   return ret;
> +}
> +
>  SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, 
> size)
>  {
>     union bpf_attr attr = {};
> @@ -650,6 +691,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, 
> uattr, unsigned int, siz
> case BPF_PROG_LOAD:
> err = bpf_prog_load();
> break;
> +   case BPF_PROG_DUMP:
> +   err = bpf_prog_dump(, uattr);
> +   break;
> default:
> err = -EINVAL;
> break;
> --
> 2.1.4
>



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] seccomp: add a way to access filters via bpf fds

2015-09-04 Thread Kees Cook
On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.ander...@canonical.com> wrote:
> This patch adds a way for a process that is "real root" to access the
> seccomp filters of another process. The process first does a
> PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> bpf(BPF_PROG_DUMP) to dump the actual program at each step.

Why is this a new ptrace interface instead of a new seccomp interface?
I would expect this to only be valid for "current", otherwise we could
run into races as the ptracee adds filters. i.e. it is not safe to
examine seccomp filters from tasks other than current.

-Kees

>
> Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
> CC: Kees Cook <keesc...@chromium.org>
> CC: Will Drewry <w...@chromium.org>
> CC: Oleg Nesterov <o...@redhat.com>
> CC: Andy Lutomirski <l...@amacapital.net>
> CC: Pavel Emelyanov <xe...@parallels.com>
> CC: Serge E. Hallyn <serge.hal...@ubuntu.com>
> CC: Alexei Starovoitov <a...@kernel.org>
> CC: Daniel Borkmann <dan...@iogearbox.net>
> ---
>  include/linux/bpf.h | 12 ++
>  include/linux/seccomp.h | 14 +++
>  include/uapi/linux/ptrace.h |  3 +++
>  kernel/bpf/syscall.c| 26 -
>  kernel/ptrace.c |  7 ++
>  kernel/seccomp.c| 57 
> +
>  6 files changed, 118 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 4383476..30682dc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -157,6 +157,8 @@ void bpf_register_prog_type(struct bpf_prog_type_list 
> *tl);
>  void bpf_register_map_type(struct bpf_map_type_list *tl);
>
>  struct bpf_prog *bpf_prog_get(u32 ufd);
> +int bpf_prog_set(u32 ufd, struct bpf_prog *new);
> +int bpf_new_fd(struct bpf_prog *prog, int flags);
>  void bpf_prog_put(struct bpf_prog *prog);
>  void bpf_prog_put_rcu(struct bpf_prog *prog);
>
> @@ -175,6 +177,16 @@ static inline struct bpf_prog *bpf_prog_get(u32 ufd)
> return ERR_PTR(-EOPNOTSUPP);
>  }
>
> +static inline int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> +{
> +   return -EINVAL;
> +}
> +
> +static inline int bpf_new_fd(struct bpf_prog *prog, int flags)
> +{
> +   return -EINVAL;
> +}
> +
>  static inline void bpf_prog_put(struct bpf_prog *prog)
>  {
>  }
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index f426503..d1a86ed 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -95,4 +95,18 @@ static inline void get_seccomp_filter(struct task_struct 
> *tsk)
> return;
>  }
>  #endif /* CONFIG_SECCOMP_FILTER */
> +
> +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
> +extern long seccomp_get_filter_fd(struct task_struct *child);
> +extern long seccomp_next_filter(struct task_struct *child, u32 fd);
> +#else
> +static inline long seccomp_get_filter_fd(struct task_struct *child)
> +{
> +   return -EINVAL;
> +}
> +static inline long seccomp_next_filter(struct task_struct *child, u32 fd)
> +{
> +   return -EINVAL;
> +}
> +#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
>  #endif /* _LINUX_SECCOMP_H */
> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index a7a6979..dfd7d2e 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -23,6 +23,9 @@
>
>  #define PTRACE_SYSCALL   24
>
> +#define PTRACE_SECCOMP_GET_FILTER_FD   40
> +#define PTRACE_SECCOMP_NEXT_FILTER 41
> +
>  /* 0x4200-0x4300 are reserved for architecture-independent additions.  */
>  #define PTRACE_SETOPTIONS  0x4200
>  #define PTRACE_GETEVENTMSG 0x4201
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ee580d0..58e7421 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -506,6 +506,30 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
>  }
>  EXPORT_SYMBOL_GPL(bpf_prog_get);
>
> +int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> +{
> +   struct fd f;
> +   struct bpf_prog *prog;
> +
> +   f = fdget(ufd);
> +
> +   prog = get_prog(f);
> +   if (!IS_ERR(prog) && prog)
> +   bpf_prog_put(prog);
> +
> +   atomic_inc(>aux->refcnt);
> +   f.file->private_data = new;
> +   fdput(f);
> +   return 0;
> +}
> +EXPORT_SYMBOL_GPL(bpf_prog_set);
> +
> +int bpf_new_fd(struct bpf_prog *prog, int flags)
> +{
> +

Re: eBPF / seccomp globals?

2015-09-04 Thread Kees Cook
On Fri, Sep 4, 2015 at 1:29 PM, Michael Tirado <mtirado...@gmail.com> wrote:
>> What we did in Chrome OS was to use the "minijail" tool[2] to
>> LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
>> a bit of a hack, but works in well-defined environments. You are
>> talking about namespaces, though, so maybe minijail is worth a look?
>> It does that too and a whole lot more.
>
> Minijail is pretty similar to what I have been working on the past few
> months,  unfortunately I have already written it, doh!  Those slides
> are a good resource,  definitely helpful as introduction to seccomp.
>
> So it seems there are no easy solutions to this problem. Using
> LD_PRELOAD to defer seccomp filter application scares me a little bit,
> and won't work with file capabilities IIRC, though it is a damn clever

Do you still need file capabilities with the availability of the new
ambient capabilities?

https://s3hh.wordpress.com/2015/07/25/ambient-capabilities/
http://thread.gmane.org/gmane.linux.kernel.lsm/24034

> solution.  I think for now I will explore the possibility of
> validating argument 1 of exec to allow only the program I am launching
> to be exec'd, so if somehow by Thor's hammer that program escapes it's
> sandbox, it will only be able to exec itself.  I suppose it will have
> to now be restricted to absolute paths only.

Well, you can only examine the memory address and not what's pointed
to, so you may be out of luck there too. Sorry! On the TODO list is
doing deep argument inspection, but it is not an easy thing to get
right. :)

-Kees

>
> Thanks everyone for the clarification!
>
> On Fri, Sep 4, 2015 at 4:01 AM, Kees Cook <keesc...@chromium.org> wrote:
>> On Thu, Sep 3, 2015 at 6:01 PM, Michael Tirado <mtirado...@gmail.com> wrote:
>>> Hiyall,
>>>
>>> I have created a seccomp white list filter for a program that launches
>>> other less trustworthy programs.  It's working great so far, but I
>>> have run into a little roadblock.  the launcher program needs to call
>>> execve as it's final step, but that may not be present in the white
>>> list.  I am wondering if there is any way to use some sort of global
>>> variable that will be preserved between syscall filter calls so that I
>>> can allow only one execve, if not present in white list by
>>> incrementing a counter variable.
>>>
>>> I see that in Documentation/networking/filter.txt one of the registers
>>> is documented as being a pointer to struct sk_buff, in the seccomp
>>> context this is a pointer to struct seccomp_data  instead, right?  and
>>> the line about callee saved registers R6-R9  probably refers to them
>>> being saved across calls within that filter, and not calls between
>>> filters?
>>>
>>> My apologies if this is not the appropriate place to ask for help, but
>>> it is difficult to find useful information on how eBPF works, and is a
>>> bit confusing trying to figure out the differences between seccomp and
>>> net filters, and the old bpf code kicking around short of spending
>>> countless hours reading through all of it.  If anybody has a some
>>> links to share I would be very grateful.  the only way I can think to
>>> make this work otherwise is to mount everything as MS_NOEXEC in the
>>> new namespace, but that just feels wrong.
>>
>> For documentation, there's some great slides on seccomp from Plumber's
>> this year[1].
>>
>> At present, there is no variable state beyond the syscall context (PC,
>> args) available to seccomp filters. The no_new_privs prctl was added
>> to reduce the risk of including execve in a filter's whitelist, but
>> that isn't as strong as the "exec once" feature you want.
>>
>> What we did in Chrome OS was to use the "minijail" tool[2] to
>> LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
>> a bit of a hack, but works in well-defined environments. You are
>> talking about namespaces, though, so maybe minijail is worth a look?
>> It does that too and a whole lot more.
>>
>> As for using maps via eBPF in seccomp, it's on the horizon, but it
>> comes with a lot exposure that I haven't finished pondering, so I
>> don't think those features will be added soon.
>>
>> -Kees
>>
>> [1] 
>> http://man7.org/conf/lpc2015/limiting_kernel_attack_surface_with_seccomp-LPC_2015-Kerrisk.pdf
>> [2] see subdirectory "minijail" after "git clone
>> https://chromium.googlesource.com/chromiumos/platform2/;
>>
>>
>> --
>> Kees Cook
>> Chrome OS Security



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] ebpf: add a seccomp program type

2015-09-04 Thread Kees Cook
On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.ander...@canonical.com> wrote:
> seccomp uses eBPF as its underlying storage and execution format, and eBPF
> has features that seccomp would like to make use of in the future. This
> patch adds a formal seccomp type to the eBPF verifier.
>
> The current implementation of the seccomp eBPF type is very limited, and
> doesn't support some interesting features (notably, maps) of eBPF. However,
> the primary motivation for this patchset is to enable checkpoint/restore
> for seccomp filters later in the series, to this limited feature set is ok
> for now.
>
> Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
> CC: Kees Cook <keesc...@chromium.org>
> CC: Will Drewry <w...@chromium.org>
> CC: Oleg Nesterov <o...@redhat.com>
> CC: Andy Lutomirski <l...@amacapital.net>
> CC: Pavel Emelyanov <xe...@parallels.com>
> CC: Serge E. Hallyn <serge.hal...@ubuntu.com>
> CC: Alexei Starovoitov <a...@kernel.org>
> CC: Daniel Borkmann <dan...@iogearbox.net>
> ---
>  include/uapi/linux/bpf.h |  1 +
>  net/core/filter.c| 95 
> 
>  2 files changed, 96 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 29ef6f9..79b825a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -122,6 +122,7 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_KPROBE,
> BPF_PROG_TYPE_SCHED_CLS,
> BPF_PROG_TYPE_SCHED_ACT,
> +   BPF_PROG_TYPE_SECCOMP,
>  };
>
>  #define BPF_PSEUDO_MAP_FD  1
> diff --git a/net/core/filter.c b/net/core/filter.c
> index be3098f..ed339fa 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1466,6 +1466,39 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
> }
>  }
>
> +static const struct bpf_func_proto *
> +seccomp_func_proto(enum bpf_func_id func_id)
> +{
> +   /* Right now seccomp eBPF loading doesn't support maps; seccomp 
> filters
> +* are considered to be read-only after they're installed, so map fds
> +* probably need to be invalidated when a seccomp filter with maps is
> +* installed.
> +*
> +* The rest of these might be reasonable to call from seccomp, so we
> +* export them.
> +*/
> +   switch (func_id) {
> +   case BPF_FUNC_ktime_get_ns:
> +   return _ktime_get_ns_proto;
> +   case BPF_FUNC_trace_printk:
> +   return bpf_get_trace_printk_proto();
> +   case BPF_FUNC_get_prandom_u32:
> +   return _get_prandom_u32_proto;
> +   case BPF_FUNC_get_smp_processor_id:
> +   return _get_smp_processor_id_proto;
> +   case BPF_FUNC_tail_call:
> +   return _tail_call_proto;
> +   case BPF_FUNC_get_current_pid_tgid:
> +   return _get_current_pid_tgid_proto;
> +   case BPF_FUNC_get_current_uid_gid:
> +   return _get_current_uid_gid_proto;
> +   case BPF_FUNC_get_current_comm:
> +   return _get_current_comm_proto;
> +   default:
> +   return NULL;
> +   }
> +}

While this list is probably fine, I don't want to mix the addition of
eBPF functions to the seccomp ABI with the CRIU changes. No function
calls are currently possible and it should stay that way.

I was expecting to see a validator, similar to the existing BPF
validator that is called when creating seccomp filters currently. Can
we add a similar validator for new BPF_PROG_TYPE_SECCOMP?

-Kees

> +
>  static bool __is_valid_access(int off, int size, enum bpf_access_type type)
>  {
> /* check bounds */
> @@ -1516,6 +1549,17 @@ static bool tc_cls_act_is_valid_access(int off, int 
> size,
> return __is_valid_access(off, size, type);
>  }
>
> +static bool seccomp_is_valid_access(int off, int size,
> +   enum bpf_access_type type)
> +{
> +   if (type == BPF_WRITE)
> +   return false;
> +
> +   if (off < 0 || off >= sizeof(struct seccomp_data) || off & 3)
> +   return false;
> +
> +   return true;
> +}
>  static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
>   int src_reg, int ctx_off,
>   struct bpf_insn *insn_buf)
> @@ -1630,6 +1674,45 @@ static u32 bpf_net_convert_ctx_access(enum 
> bpf_access_type type, int dst_reg,
> return insn - insn_buf;
>  }
>
> +static u32 seccomp_convert_ctx_access(enum bpf_access_type type, int dst_reg,
> +

Re: [PATCH 1/6] ebpf: add a seccomp program type

2015-09-04 Thread Kees Cook
On Fri, Sep 4, 2015 at 2:06 PM, Tycho Andersen
<tycho.ander...@canonical.com> wrote:
> On Fri, Sep 04, 2015 at 01:34:12PM -0700, Kees Cook wrote:
>> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> <tycho.ander...@canonical.com> wrote:
>> > +static const struct bpf_func_proto *
>> > +seccomp_func_proto(enum bpf_func_id func_id)
>> > +{
>> > +   /* Right now seccomp eBPF loading doesn't support maps; seccomp 
>> > filters
>> > +* are considered to be read-only after they're installed, so map 
>> > fds
>> > +* probably need to be invalidated when a seccomp filter with maps 
>> > is
>> > +* installed.
>> > +*
>> > +* The rest of these might be reasonable to call from seccomp, so 
>> > we
>> > +* export them.
>> > +*/
>> > +   switch (func_id) {
>> > +   case BPF_FUNC_ktime_get_ns:
>> > +   return _ktime_get_ns_proto;
>> > +   case BPF_FUNC_trace_printk:
>> > +   return bpf_get_trace_printk_proto();
>> > +   case BPF_FUNC_get_prandom_u32:
>> > +   return _get_prandom_u32_proto;
>> > +   case BPF_FUNC_get_smp_processor_id:
>> > +   return _get_smp_processor_id_proto;
>> > +   case BPF_FUNC_tail_call:
>> > +   return _tail_call_proto;
>> > +   case BPF_FUNC_get_current_pid_tgid:
>> > +   return _get_current_pid_tgid_proto;
>> > +   case BPF_FUNC_get_current_uid_gid:
>> > +   return _get_current_uid_gid_proto;
>> > +   case BPF_FUNC_get_current_comm:
>> > +   return _get_current_comm_proto;
>> > +   default:
>> > +   return NULL;
>> > +   }
>> > +}
>>
>> While this list is probably fine, I don't want to mix the addition of
>> eBPF functions to the seccomp ABI with the CRIU changes. No function
>> calls are currently possible and it should stay that way.
>
> Ok, I can remove them.
>
>> I was expecting to see a validator, similar to the existing BPF
>> validator that is called when creating seccomp filters currently. Can
>> we add a similar validator for new BPF_PROG_TYPE_SECCOMP?
>
> That's effectively what this patch does; when the eBPF is loaded via
> bpf(), you tell bpf() you want a BPF_PROG_TYPE_SECCOMP, and it invokes
> this validation/translation code, i.e. it uses
> seccomp_is_valid_access() to check and make sure access are aligned
> and inside struct seccomp_data.

What about limiting the possible instructions?

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd

2015-09-04 Thread Kees Cook
On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.ander...@canonical.com> wrote:
> This is the final bit needed to support seccomp filters created via the bpf
> syscall.
>
> One concern with this patch is exactly what the interface should look like
> for users, since seccomp()'s second argument is a pointer, we could ask
> people to pass a pointer to the fd, but implies we might write to it which
> seems impolite. Right now we cast the pointer (and force the user to cast
> it), which generates ugly warnings. I'm not sure what the right answer is
> here.
>
> Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
> CC: Kees Cook <keesc...@chromium.org>
> CC: Will Drewry <w...@chromium.org>
> CC: Oleg Nesterov <o...@redhat.com>
> CC: Andy Lutomirski <l...@amacapital.net>
> CC: Pavel Emelyanov <xe...@parallels.com>
> CC: Serge E. Hallyn <serge.hal...@ubuntu.com>
> CC: Alexei Starovoitov <a...@kernel.org>
> CC: Daniel Borkmann <dan...@iogearbox.net>
> ---
>  include/linux/seccomp.h  |  3 +-
>  include/uapi/linux/seccomp.h |  1 +
>  kernel/seccomp.c | 70 
> 
>  3 files changed, 61 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index d1a86ed..a725dd5 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -3,7 +3,8 @@
>
>  #include 
>
> -#define SECCOMP_FILTER_FLAG_MASK   (SECCOMP_FILTER_FLAG_TSYNC)
> +#define SECCOMP_FILTER_FLAG_MASK   (\
> +   SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>
>  #ifdef CONFIG_SECCOMP
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a4..c29a423 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -16,6 +16,7 @@
>
>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>  #define SECCOMP_FILTER_FLAG_TSYNC  1
> +#define SECCOMP_FILTER_FLAG_EBPF   (1 << 1)
>
>  /*
>   * All BPF programs must return a 32-bit value.
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index a2c5b32..9c6bea6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -355,17 +355,6 @@ static struct seccomp_filter 
> *seccomp_prepare_filter(struct sock_fprog *fprog)
>
> BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>
> -   /*
> -* Installing a seccomp filter requires that the task has
> -* CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> -* This avoids scenarios where unprivileged tasks can affect the
> -* behavior of privileged children.
> -*/
> -   if (!task_no_new_privs(current) &&
> -   security_capable_noaudit(current_cred(), current_user_ns(),
> -CAP_SYS_ADMIN) != 0)
> -   return ERR_PTR(-EACCES);
> -
> /* Allocate a new seccomp_filter */
> sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
> if (!sfilter)
> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
> info.si_syscall = syscall;
> force_sig_info(SIGSYS, , current);
>  }
> +
> +#ifdef CONFIG_BPF_SYSCALL
> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> +{
> +   /* XXX: this cast generates a warning. should we make people pass in
> +* , or is there some nicer way of doing this?
> +*/
> +   u32 fd = (u32) filter;

I think this is probably the right way to do it, modulo getting the
warning fixed. Let me invoke the great linux-api subscribers to get
some more opinions.

tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
pointer argument into an fd argument. Is this sane, should it be a
pointer to an fd, or should it not be a flag at all, creating a new
seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?

-Kees

> +   struct seccomp_filter *ret;
> +   struct bpf_prog *prog;
> +
> +   prog = bpf_prog_get(fd);
> +   if (IS_ERR(prog))
> +   return (struct seccomp_filter *) prog;
> +
> +   if (prog->type != BPF_PROG_TYPE_SECCOMP) {
> +   bpf_prog_put(prog);
> +   return ERR_PTR(-EINVAL);
> +   }
> +
> +   ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
> +   if (!ret) {
> +   bpf_prog_put(prog);
> +   return ERR_PTR(-ENOMEM);
> +   }
> +
> +   ret->prog = prog;
> +   atomic_set(>usage, 1);
> +
> +   /* Intentionally don't bpf_prog_put() here, because the underlying 
> prog
> +* is refcounte

Re: [PATCH 3/6] ebpf: add a way to dump an eBPF program

2015-09-04 Thread Kees Cook
On Fri, Sep 4, 2015 at 1:45 PM, Tycho Andersen
<tycho.ander...@canonical.com> wrote:
> On Fri, Sep 04, 2015 at 01:17:30PM -0700, Kees Cook wrote:
>> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> <tycho.ander...@canonical.com> wrote:
>> > This commit adds a way to dump eBPF programs. The initial implementation
>> > doesn't support maps, and therefore only allows dumping seccomp ebpf
>> > programs which themselves don't currently support maps.
>> >
>> > We export the GPL bit as well as a unique ID for the program so that
>>
>> This unique ID appears to be the heap address for the prog. That's a
>> huge leak, and should not be done. We don't want to introduce new
>> kernel address leaks while we're trying to fix the remaining ones.
>> Shouldn't the "unique ID" be the fd itself? I imagine KCMP_FILE
>> could be used, for example.
>
> No; we acquire the fd per process, so if a task installs a filter and
> then forks N times, we'll grab N (+1) copies of the filter from N (+1)
> different file descriptors. Ideally, we'd have some way to figure out
> that these were all the same. Some sort of prog_id is one way,
> although there may be others.

If KCMP_FILE or a new KCMP_BPF isn't possible, then we'll probably
have to add a unique id (counter) to all bpf programs as they're
created.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd

2015-09-08 Thread Kees Cook
On Tue, Sep 8, 2015 at 6:40 AM, Tycho Andersen
<tycho.ander...@canonical.com> wrote:
> On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote:
>> On 09/04/2015 10:41 PM, Kees Cook wrote:
>> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> > <tycho.ander...@canonical.com> wrote:
>> >> This is the final bit needed to support seccomp filters created via the 
>> >> bpf
>> >> syscall.
>>
>> Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
>> the outset.
>
> Apologies, I'll cc the list on future versions.
>
>> Tycho, where's the man-pages patch describing this new kernel-userspace
>> API feature? :-)
>
> Once we get the API finalized I'm happy to write it.
>
>> >> One concern with this patch is exactly what the interface should look like
>> >> for users, since seccomp()'s second argument is a pointer, we could ask
>> >> people to pass a pointer to the fd, but implies we might write to it which
>> >> seems impolite. Right now we cast the pointer (and force the user to cast
>> >> it), which generates ugly warnings. I'm not sure what the right answer is
>> >> here.
>> >>
>> >> Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
>> >> CC: Kees Cook <keesc...@chromium.org>
>> >> CC: Will Drewry <w...@chromium.org>
>> >> CC: Oleg Nesterov <o...@redhat.com>
>> >> CC: Andy Lutomirski <l...@amacapital.net>
>> >> CC: Pavel Emelyanov <xe...@parallels.com>
>> >> CC: Serge E. Hallyn <serge.hal...@ubuntu.com>
>> >> CC: Alexei Starovoitov <a...@kernel.org>
>> >> CC: Daniel Borkmann <dan...@iogearbox.net>
>> >> ---
>> >>  include/linux/seccomp.h  |  3 +-
>> >>  include/uapi/linux/seccomp.h |  1 +
>> >>  kernel/seccomp.c | 70 
>> >> 
>> >>  3 files changed, 61 insertions(+), 13 deletions(-)
>> >>
>> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> >> index d1a86ed..a725dd5 100644
>> >> --- a/include/linux/seccomp.h
>> >> +++ b/include/linux/seccomp.h
>> >> @@ -3,7 +3,8 @@
>> >>
>> >>  #include 
>> >>
>> >> -#define SECCOMP_FILTER_FLAG_MASK   (SECCOMP_FILTER_FLAG_TSYNC)
>> >> +#define SECCOMP_FILTER_FLAG_MASK   (\
>> >> +   SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>> >>
>> >>  #ifdef CONFIG_SECCOMP
>> >>
>> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> >> index 0f238a4..c29a423 100644
>> >> --- a/include/uapi/linux/seccomp.h
>> >> +++ b/include/uapi/linux/seccomp.h
>> >> @@ -16,6 +16,7 @@
>> >>
>> >>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>> >>  #define SECCOMP_FILTER_FLAG_TSYNC  1
>> >> +#define SECCOMP_FILTER_FLAG_EBPF   (1 << 1)
>> >>
>> >>  /*
>> >>   * All BPF programs must return a 32-bit value.
>> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> >> index a2c5b32..9c6bea6 100644
>> >> --- a/kernel/seccomp.c
>> >> +++ b/kernel/seccomp.c
>> >> @@ -355,17 +355,6 @@ static struct seccomp_filter 
>> >> *seccomp_prepare_filter(struct sock_fprog *fprog)
>> >>
>> >> BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>> >>
>> >> -   /*
>> >> -* Installing a seccomp filter requires that the task has
>> >> -* CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> >> -* This avoids scenarios where unprivileged tasks can affect the
>> >> -* behavior of privileged children.
>> >> -*/
>> >> -   if (!task_no_new_privs(current) &&
>> >> -   security_capable_noaudit(current_cred(), current_user_ns(),
>> >> -CAP_SYS_ADMIN) != 0)
>> >> -   return ERR_PTR(-EACCES);
>> >> -
>> >> /* Allocate a new seccomp_filter */
>> >> sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>> >> if (!sfilter)
>> >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int 
>> >> reason)
>> >>

Re: eBPF / seccomp globals?

2015-09-03 Thread Kees Cook
On Thu, Sep 3, 2015 at 6:01 PM, Michael Tirado <mtirado...@gmail.com> wrote:
> Hiyall,
>
> I have created a seccomp white list filter for a program that launches
> other less trustworthy programs.  It's working great so far, but I
> have run into a little roadblock.  the launcher program needs to call
> execve as it's final step, but that may not be present in the white
> list.  I am wondering if there is any way to use some sort of global
> variable that will be preserved between syscall filter calls so that I
> can allow only one execve, if not present in white list by
> incrementing a counter variable.
>
> I see that in Documentation/networking/filter.txt one of the registers
> is documented as being a pointer to struct sk_buff, in the seccomp
> context this is a pointer to struct seccomp_data  instead, right?  and
> the line about callee saved registers R6-R9  probably refers to them
> being saved across calls within that filter, and not calls between
> filters?
>
> My apologies if this is not the appropriate place to ask for help, but
> it is difficult to find useful information on how eBPF works, and is a
> bit confusing trying to figure out the differences between seccomp and
> net filters, and the old bpf code kicking around short of spending
> countless hours reading through all of it.  If anybody has a some
> links to share I would be very grateful.  the only way I can think to
> make this work otherwise is to mount everything as MS_NOEXEC in the
> new namespace, but that just feels wrong.

For documentation, there's some great slides on seccomp from Plumber's
this year[1].

At present, there is no variable state beyond the syscall context (PC,
args) available to seccomp filters. The no_new_privs prctl was added
to reduce the risk of including execve in a filter's whitelist, but
that isn't as strong as the "exec once" feature you want.

What we did in Chrome OS was to use the "minijail" tool[2] to
LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
a bit of a hack, but works in well-defined environments. You are
talking about namespaces, though, so maybe minijail is worth a look?
It does that too and a whole lot more.

As for using maps via eBPF in seccomp, it's on the horizon, but it
comes with a lot exposure that I haven't finished pondering, so I
don't think those features will be added soon.

-Kees

[1] 
http://man7.org/conf/lpc2015/limiting_kernel_attack_surface_with_seccomp-LPC_2015-Kerrisk.pdf
[2] see subdirectory "minijail" after "git clone
https://chromium.googlesource.com/chromiumos/platform2/;


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Kees Cook
On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
<tycho.ander...@canonical.com> wrote:
> Hi all,
>
> Here's v5 of the seccomp filter c/r set. The individual patch notes have
> changes, but two highlights are:
>
> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>   will need to be built with that patch applied. This gets rid of two 
> incorrect
>   patches in the previous series and is a nicer API.
>
> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>   same struct file across calls, so we still need a kcmp command. I've 
> narrowed
>   the scope of the one being added to only compare seccomp fds.
>
> Thoughts welcome,

Hi, sorry I've been slow/busy. I'm finally reading through these threads.

Happy bit:
- avoiding eBPF and just saving the original filters makes things much easier.

Sad bit:
- inventing a new interface for seccompfds feels like massive overkill to me.

While Andy has big dreams, we're not presently doing seccompfd
monitoring, etc. There's no driving user for that kind of interface,
and accepting the maintenance burden of it only for CRIU seems unwise.

So, I'll go back to what I originally proposed at LSS (which it looks
like we're half way there now):

- save the original filter (done!)
- extract filters through a single special-purpose interface (looks
like ptrace is the way to go: root-only, stopped process, etc)
- compare filter content and issue TSYNCs to merge detected sibling
threads, since merging things that weren't merged before creates no
problems.

This means the parenting logic is heuristic, but it's entirely in
userspace, so the complexity burden doesn't live in seccomp which we,
by design, want to keep as simple as possible.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Kees Cook
On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook <keesc...@chromium.org> wrote:
>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>> <tycho.ander...@canonical.com> wrote:
>>> Hi all,
>>>
>>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>>> changes, but two highlights are:
>>>
>>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>>>   will need to be built with that patch applied. This gets rid of two 
>>> incorrect
>>>   patches in the previous series and is a nicer API.
>>>
>>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>>>   same struct file across calls, so we still need a kcmp command. I've 
>>> narrowed
>>>   the scope of the one being added to only compare seccomp fds.
>>>
>>> Thoughts welcome,
>>
>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>
>> Happy bit:
>> - avoiding eBPF and just saving the original filters makes things much 
>> easier.
>>
>> Sad bit:
>> - inventing a new interface for seccompfds feels like massive overkill to me.
>>
>> While Andy has big dreams, we're not presently doing seccompfd
>> monitoring, etc. There's no driving user for that kind of interface,
>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>
>> So, I'll go back to what I originally proposed at LSS (which it looks
>> like we're half way there now):
>>
>> - save the original filter (done!)
>> - extract filters through a single special-purpose interface (looks
>> like ptrace is the way to go: root-only, stopped process, etc)
>> - compare filter content and issue TSYNCs to merge detected sibling
>> threads, since merging things that weren't merged before creates no
>> problems.
>>
>> This means the parenting logic is heuristic, but it's entirely in
>> userspace, so the complexity burden doesn't live in seccomp which we,
>> by design, want to keep as simple as possible.
>
> This is okay with me with a future-proofing caveat: I think that
> whatever reads out the filter should be clearly documented as
> returning some special error code that indicates that that filter it
> tried to read wasn't in the expected form.  That would happen for
> native eBPF filters, and it would also happen for seccomp monitors
> even if those monitors use classic BPF.

As in, it should have something like "give me BPF" and that'll start
failing when it's only eBPF in the future?

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Kees Cook
On Fri, Oct 2, 2015 at 3:04 PM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Fri, Oct 2, 2015 at 3:02 PM, Kees Cook <keesc...@chromium.org> wrote:
>> On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski <l...@amacapital.net> wrote:
>>> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook <keesc...@chromium.org> wrote:
>>>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>>>> <tycho.ander...@canonical.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>>>>> changes, but two highlights are:
>>>>>
>>>>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ 
>>>>> and
>>>>>   will need to be built with that patch applied. This gets rid of two 
>>>>> incorrect
>>>>>   patches in the previous series and is a nicer API.
>>>>>
>>>>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return 
>>>>> the
>>>>>   same struct file across calls, so we still need a kcmp command. I've 
>>>>> narrowed
>>>>>   the scope of the one being added to only compare seccomp fds.
>>>>>
>>>>> Thoughts welcome,
>>>>
>>>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>>>
>>>> Happy bit:
>>>> - avoiding eBPF and just saving the original filters makes things much 
>>>> easier.
>>>>
>>>> Sad bit:
>>>> - inventing a new interface for seccompfds feels like massive overkill to 
>>>> me.
>>>>
>>>> While Andy has big dreams, we're not presently doing seccompfd
>>>> monitoring, etc. There's no driving user for that kind of interface,
>>>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>>>
>>>> So, I'll go back to what I originally proposed at LSS (which it looks
>>>> like we're half way there now):
>>>>
>>>> - save the original filter (done!)
>>>> - extract filters through a single special-purpose interface (looks
>>>> like ptrace is the way to go: root-only, stopped process, etc)
>>>> - compare filter content and issue TSYNCs to merge detected sibling
>>>> threads, since merging things that weren't merged before creates no
>>>> problems.
>>>>
>>>> This means the parenting logic is heuristic, but it's entirely in
>>>> userspace, so the complexity burden doesn't live in seccomp which we,
>>>> by design, want to keep as simple as possible.
>>>
>>> This is okay with me with a future-proofing caveat: I think that
>>> whatever reads out the filter should be clearly documented as
>>> returning some special error code that indicates that that filter it
>>> tried to read wasn't in the expected form.  That would happen for
>>> native eBPF filters, and it would also happen for seccomp monitors
>>> even if those monitors use classic BPF.
>>
>> As in, it should have something like "give me BPF" and that'll start
>> failing when it's only eBPF in the future?
>
> Yes, but it might also start failing when if my dreams come true, it's
> still classic BPF, but it's no longer a classic seccomp bpf filter
> layer with the semantics we expect today.  (E.g. if it's classic bpf
> but has a monitor attached, then the read should fail because
> restoring it without restoring the monitor will cause all kinds of
> mess.)

Ah-ha! Understood, and yeah, that seems fine.

Speaking of dreams -- what do you think about re-running seccomp in
the face of changed syscalls due to ptrace? Closing the ptrace hole
would be really nice.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Kees Cook
On Fri, Oct 2, 2015 at 3:57 PM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 10/03/2015 12:44 AM, Tycho Andersen wrote:
>>
>> On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
>
> ...
>>
>> Ok, how about,
>>
>> struct sock_filter insns[BPF_MAXINSNS];
>> insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
>
>
> Would also be good that when the storage buffer (insns) is NULL,
> it just returns you the number of sock_filter insns (or 0 when
> nothing attached).
>
> That would be consistent with classic socket filters (see
> sk_get_filter()), and user space could allocate a specific
> size instead of always passing in max insns.

Yes please. :)

>> when asking for the ith filter? It returns either the number of
>> instructions, -EINVAL if something was wrong (i, pid,
>> CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
>> succeed now, if/when the underlying filter was not created from a bpf
>> classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
>> picked this mostly based on what sounds nice.)

We can bikeshed the non-classic case when we need it, but I think
EINVAL is "not under seccomp", and ENOENT is "no such index".

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] ebpf: add a seccomp program type

2015-09-09 Thread Kees Cook
On Wed, Sep 9, 2015 at 9:52 AM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Wed, Sep 09, 2015 at 09:37:51AM -0700, Kees Cook wrote:
>> On Wed, Sep 9, 2015 at 9:09 AM, Daniel Borkmann <dan...@iogearbox.net> wrote:
>> > On 09/09/2015 06:07 PM, Alexei Starovoitov wrote:
>> >>
>> >> On Wed, Sep 09, 2015 at 09:50:35AM -0600, Tycho Andersen wrote:
>> >
>> > [...]
>> >>>
>> >>> Thoughts?
>> >>
>> >>
>> >> Please do not add any per-instruction hacks. None of them are
>> >> necessary. Classic had to do extra ugly checks in seccomp only
>> >> because verifier wasn't flexible enough.
>> >> If you don't want to see any BPF_CALL in seccomp, just have
>> >> empty get_func_proto() callback for BPF_PROG_TYPE_SECCOMP
>> >> and verifier will reject all calls.
>> >> Currently we have only two non-generic instrucitons
>> >> LD_ABS and LD_IND that are avaialable for sockets/TC only,
>> >> because these are legacy instructions and we had to make
>> >> exceptions for them.
>> >
>> > Yep, +1.
>>
>> Hrmpf. This adds to the cognitive load for accepting this patch
>> series. :P Now I have to convince myself that there is no additional
>> exposure to seccomp by using the entire set of eBPF instructions.
>> While I'm pretty sure it'll be fine, I really don't want to risk being
>> wrong and opening a hole here. I will spend some time looking at the
>> new eBPF instructions...
>
> note, as was discussed many times before, there is no pointer leak
> prevention pass yet, so eBPF is root only.
> Once the pass is complete it will prevent passing addresses to
> functions, storing them in maps and returning from the program.

Tycho, are you building new eBPF filters as the root user and then
attaching them later? I was imagining you were going to need this
entirely as non-root.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] ebpf: add a seccomp program type

2015-09-09 Thread Kees Cook
On Wed, Sep 9, 2015 at 9:09 AM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 09/09/2015 06:07 PM, Alexei Starovoitov wrote:
>>
>> On Wed, Sep 09, 2015 at 09:50:35AM -0600, Tycho Andersen wrote:
>
> [...]
>>>
>>> Thoughts?
>>
>>
>> Please do not add any per-instruction hacks. None of them are
>> necessary. Classic had to do extra ugly checks in seccomp only
>> because verifier wasn't flexible enough.
>> If you don't want to see any BPF_CALL in seccomp, just have
>> empty get_func_proto() callback for BPF_PROG_TYPE_SECCOMP
>> and verifier will reject all calls.
>> Currently we have only two non-generic instrucitons
>> LD_ABS and LD_IND that are avaialable for sockets/TC only,
>> because these are legacy instructions and we had to make
>> exceptions for them.
>
> Yep, +1.

Hrmpf. This adds to the cognitive load for accepting this patch
series. :P Now I have to convince myself that there is no additional
exposure to seccomp by using the entire set of eBPF instructions.
While I'm pretty sure it'll be fine, I really don't want to risk being
wrong and opening a hole here. I will spend some time looking at the
new eBPF instructions...

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Kees Cook
On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitov <a...@plumgrid.com> wrote:
> On 10/5/15 2:00 PM, Kees Cook wrote:
>>
>> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov<a...@plumgrid.com>
>> wrote:
>>>
>>> >In order to let unprivileged users load and execute eBPF programs
>>> >teach verifier to prevent pointer leaks.
>>> >Verifier will prevent
>>> >- any arithmetic on pointers
>>> >   (except R10+Imm which is used to compute stack addresses)
>>> >- comparison of pointers
>>> >- passing pointers to helper functions
>>> >- indirectly passing pointers in stack to helper functions
>>> >- returning pointer from bpf program
>>> >- storing pointers into ctx or maps
>>
>> Does the arithmetic restriction include using a pointer as an index to
>> a maps-based tail call? I'm still worried about pointer-based
>> side-effects.
>
>
> the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
> BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
> from the program side, so programs cannot see or manipulate
> those pointers.
> For the former only bpf_tail_call() is allowed that takes integer
> index and jumps to it. And the latter map accessed with

Okay, so I can't take a pointer, put it on the stack, take it back any
part of it as an integer and use it for a tail call?

Sounds like this is shaping up nicely! Thanks for adding all these checks.

-Kees

> bpf_perf_event_read() that also takes index only (this helper
> is not available to socket filters anyway).
> Also bpf_tail_call() can only jump to the program of the same type.
> So I'm quite certain it's safe.
>
> Yes, please ask questions and try to poke holes. Now it is time.
>



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bridge: Only call /sbin/bridge-stp for the initial network namespace

2015-12-08 Thread Kees Cook
On Wed, Dec 2, 2015 at 8:50 PM, David Miller <da...@davemloft.net> wrote:
> From: ebied...@xmission.com (Eric W. Biederman)
> Date: Mon, 30 Nov 2015 15:38:15 -0600
>
>> + if (dev_net(br->dev) == _net)
>
> Please respin this using net_eq() as Hannes pointed out.

Sorry if I missed it: this this happen yet?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] isdn: Partially revert debug format string usage clean up

2015-11-25 Thread Kees Cook
On Tue, Nov 24, 2015 at 10:47 PM, Christoph Biedl
<linux-kernel.b...@manchmal.in-ulm.de> wrote:
> Commit 35a4a57 ("isdn: clean up debug format string usage") introduced
> a safeguard to avoid accidential format string interpolation of data
> when calling debugl1 or HiSax_putstatus. This did however not take into
> account VHiSax_putstatus (called by HiSax_putstatus) does *not* call
> vsprintf if the head parameter is NULL - the format string is treated
> as plain text then instead. As a result, the string "%s" is processed
> literally, and the actual information is lost. This affects the isdnlog
> userspace program which stopped logging information since that commit.

Oh, that's weird, but yeah, these calls aren't expanding format
strings, so I'm fine with this change.

Acked-by: Kees Cook <keesc...@chromium.org>

On the other hand, VHiSax_putstatus contains an unchecked data buffer
overflow, since it uses tmpbuf without checking lengths at all:

p = tmpbuf;
if (head) {
p += jiftime(p, jiffies);
p += sprintf(p, " %s", head);
p += vsprintf(p, fmt, args);
*p++ = '\n';
*p = 0;
len = p - tmpbuf;
p = tmpbuf;
} else {
p = fmt;
len = strlen(fmt);
}
if (len > HISAX_STATUS_BUFSIZE) {
spin_unlock_irqrestore(>statlock, flags);
printk(KERN_WARNING "HiSax: status overflow %d/%d\n",
   len, HISAX_STATUS_BUFSIZE);
return;
}

It helpfully detects the overflow, but at this point it's too late:
the buffer (and things past it) have already been clobbered. Seems
reachable through capi_debug, callc_debug, but nothing jumps out at me
as passing taking too-long arguments, though there are some long call
chains that touch ioctls and tty stuff, which seems scary. :)

Also, though mitigated by needing DEB_DLOG_VERBOSE, there seems to be
and overflow in dlogframe too: it doesn't validate against
MAX_DLOG_SPACE. For example path:
GROUP_TEI
CTRL_SAPI
ftyp == 3
while ... adds to dp for length of skb
... heap buffer overflow

-Kees

>
> Fixes: 35a4a5733b0a ("isdn: clean up debug format string usage")
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Karsten Keil <i...@linux-pingi.de>
> Signed-off-by: Christoph Biedl <linux-kernel.b...@manchmal.in-ulm.de>
> ---
>  drivers/isdn/hisax/config.c  | 2 +-
>  drivers/isdn/hisax/hfc_pci.c | 2 +-
>  drivers/isdn/hisax/hfc_sx.c  | 2 +-
>  drivers/isdn/hisax/q931.c| 6 +++---
>  4 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/isdn/hisax/config.c b/drivers/isdn/hisax/config.c
> index b33f53b..bf04d2a 100644
> --- a/drivers/isdn/hisax/config.c
> +++ b/drivers/isdn/hisax/config.c
> @@ -1896,7 +1896,7 @@ static void EChannel_proc_rcv(struct hisax_d_if *d_if)
> ptr--;
> *ptr++ = '\n';
> *ptr = 0;
> -   HiSax_putstatus(cs, NULL, "%s", cs->dlog);
> +   HiSax_putstatus(cs, NULL, cs->dlog);
> } else
> HiSax_putstatus(cs, "LogEcho: ",
> "warning Frame too big (%d)",
> diff --git a/drivers/isdn/hisax/hfc_pci.c b/drivers/isdn/hisax/hfc_pci.c
> index 4a48255..90449e1 100644
> --- a/drivers/isdn/hisax/hfc_pci.c
> +++ b/drivers/isdn/hisax/hfc_pci.c
> @@ -901,7 +901,7 @@ Begin:
> ptr--;
> *ptr++ = '\n';
> *ptr = 0;
> -   HiSax_putstatus(cs, NULL, "%s", 
> cs->dlog);
> +   HiSax_putstatus(cs, NULL, cs->dlog);
> } else
> HiSax_putstatus(cs, "LogEcho: ", 
> "warning Frame too big (%d)", total - 3);
> }
> diff --git a/drivers/isdn/hisax/hfc_sx.c b/drivers/isdn/hisax/hfc_sx.c
> index b1fad81..13b2151 100644
> --- a/drivers/isdn/hisax/hfc_sx.c
> +++ b/drivers/isdn/hisax/hfc_sx.c
> @@ -674,7 +674,7 @@ receive_emsg(struct IsdnCardState *cs)
> ptr--;
> *ptr++ = '\n';
> *ptr = 0;
> -   HiSax_putstatus(cs, NULL, "%s", 
> cs->dlog);
> +   HiSax_putstatus(cs,

Re: user controllable usermodehelper in br_stp_if.c

2015-11-30 Thread Kees Cook
On Sun, Nov 29, 2015 at 2:43 PM, Richard Weinberger <rich...@nod.at> wrote:
> Hi!
>
> By spawning new network and user namesapces an unprivileged user
> is able to execute /sbin/bridge-stp within the initial mount namespace
> with global root rights.
> While this cannot directly be used to break out of a container or gain
> global root rights it could be used by exploit writers as valuable building 
> block.
>
> e.g.
> $ unshare -U -r -n /bin/sh
> $ brctl addbr br0
> $ brctl stp br0 on # this will execute /sbin/bridge-stp
>
> As this mechanism clearly cannot work with containers and seems to be legacy 
> code
> I suggest not calling call_usermodehelper() at all if we're not in the 
> initial user namespace.
> What do you think?

I'm not familiar with how bridge-stp is expected to operate with a
network namespace, but if it's meaningless, then yeah, that seems like
a reasonable change. Can you send a patch? (Also, if it's legacy code,
maybe it could be turned off entirely, not just for containers?)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] lib: move strtobool to kstrtobool

2016-02-04 Thread Kees Cook
Create the kstrtobool_from_user helper and moves strtobool logic into
the new kstrtobool (matching all the other kstrto* functions). Provides
an inline wrapper for existing strtobool callers.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/kernel.h |  3 +++
 include/linux/string.h |  6 +-
 lib/kstrtox.c  | 35 +++
 lib/string.c   | 29 -
 4 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index f31638c6e873..cdc25f47a23f 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -357,6 +357,7 @@ int __must_check kstrtou16(const char *s, unsigned int 
base, u16 *res);
 int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
 int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
 int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
+int __must_check kstrtobool(const char *s, unsigned int base, bool *res);
 
 int __must_check kstrtoull_from_user(const char __user *s, size_t count, 
unsigned int base, unsigned long long *res);
 int __must_check kstrtoll_from_user(const char __user *s, size_t count, 
unsigned int base, long long *res);
@@ -368,6 +369,8 @@ int __must_check kstrtou16_from_user(const char __user *s, 
size_t count, unsigne
 int __must_check kstrtos16_from_user(const char __user *s, size_t count, 
unsigned int base, s16 *res);
 int __must_check kstrtou8_from_user(const char __user *s, size_t count, 
unsigned int base, u8 *res);
 int __must_check kstrtos8_from_user(const char __user *s, size_t count, 
unsigned int base, s8 *res);
+int __must_check kstrtobool_from_user(const char __user *s, size_t count,
+ unsigned int base, bool *res);
 
 static inline int __must_check kstrtou64_from_user(const char __user *s, 
size_t count, unsigned int base, u64 *res)
 {
diff --git a/include/linux/string.h b/include/linux/string.h
index 9eebc66d957a..d2fb21b1081d 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -128,7 +128,11 @@ extern char **argv_split(gfp_t gfp, const char *str, int 
*argcp);
 extern void argv_free(char **argv);
 
 extern bool sysfs_streq(const char *s1, const char *s2);
-extern int strtobool(const char *s, bool *res);
+extern int kstrtobool(const char *s, unsigned int base, bool *res);
+static inline int strtobool(const char *s, bool *res)
+{
+   return kstrtobool(s, 0, res);
+}
 
 #ifdef CONFIG_BINARY_PRINTF
 int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index 94be244e8441..e18f088704d7 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -321,6 +321,40 @@ int kstrtos8(const char *s, unsigned int base, s8 *res)
 }
 EXPORT_SYMBOL(kstrtos8);
 
+/**
+ * kstrtobool - convert common user inputs into boolean values
+ * @s: input string
+ * @base: ignored
+ * @res: result
+ *
+ * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
+ * Otherwise it will return -EINVAL.  Value pointed to by res is
+ * updated upon finding a match.
+ */
+int kstrtobool(const char *s, unsigned int base, bool *res)
+{
+   if (!s)
+   return -EINVAL;
+
+   switch (s[0]) {
+   case 'y':
+   case 'Y':
+   case '1':
+   *res = true;
+   return 0;
+   case 'n':
+   case 'N':
+   case '0':
+   *res = false;
+   return 0;
+   default:
+   break;
+   }
+
+   return -EINVAL;
+}
+EXPORT_SYMBOL(kstrtobool);
+
 #define kstrto_from_user(f, g, type)   \
 int f(const char __user *s, size_t count, unsigned int base, type *res)
\
 {  \
@@ -345,3 +379,4 @@ kstrto_from_user(kstrtou16_from_user,   kstrtou16,  
u16);
 kstrto_from_user(kstrtos16_from_user,  kstrtos16,  s16);
 kstrto_from_user(kstrtou8_from_user,   kstrtou8,   u8);
 kstrto_from_user(kstrtos8_from_user,   kstrtos8,   s8);
+kstrto_from_user(kstrtobool_from_user, kstrtobool, bool);
diff --git a/lib/string.c b/lib/string.c
index 0323c0d5629a..1a90db9bc6e1 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -630,35 +630,6 @@ bool sysfs_streq(const char *s1, const char *s2)
 }
 EXPORT_SYMBOL(sysfs_streq);
 
-/**
- * strtobool - convert common user inputs into boolean values
- * @s: input string
- * @res: result
- *
- * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
- * Otherwise it will return -EINVAL.  Value pointed to by res is
- * updated upon finding a match.
- */
-int strtobool(const char *s, bool *res)
-{
-   switch (s[0]) {
-   case 'y':
-   case 'Y':
-   case '1':
-   *res = true;
-   break;
-   case 'n':
-   case 'N':
-   case '0':
-   *res = false;
-   break;
-   d

[PATCH v2 4/4] param: convert some "on"/"off" users to strtobool

2016-02-04 Thread Kees Cook
This changes several users of manual "on"/"off" parsing to use strtobool.
(Which means they will now parse y/n/1/0 meaningfully too.)

Signed-off-by: Kees Cook <keesc...@chromium.org>
Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
Acked-by: Michael Ellerman <m...@ellerman.id.au>
Cc: x...@kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
---
 arch/powerpc/kernel/rtasd.c  |  9 ++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++
 arch/s390/kernel/time.c  |  8 ++--
 arch/s390/kernel/topology.c  |  7 ++-
 arch/x86/kernel/aperture_64.c| 12 ++--
 include/linux/tick.h |  2 +-
 kernel/time/hrtimer.c| 10 ++
 kernel/time/tick-sched.c | 10 ++
 8 files changed, 15 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 5a2c049c1c61..567ed5a2f43a 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -49,7 +49,7 @@ static unsigned int rtas_error_log_buffer_max;
 static unsigned int event_scan;
 static unsigned int rtas_event_scan_rate;
 
-static int full_rtas_msgs = 0;
+static bool full_rtas_msgs;
 
 /* Stop logging to nvram after first fatal error */
 static int logging_enabled; /* Until we initialize everything,
@@ -592,11 +592,6 @@ __setup("surveillance=", surveillance_setup);
 
 static int __init rtasmsgs_setup(char *str)
 {
-   if (strcmp(str, "on") == 0)
-   full_rtas_msgs = 1;
-   else if (strcmp(str, "off") == 0)
-   full_rtas_msgs = 0;
-
-   return 1;
+   return kstrtobool(str, 0, _rtas_msgs);
 }
 __setup("rtasmsgs=", rtasmsgs_setup);
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 32274f72fe3f..b9787cae4108 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -47,20 +47,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = 
CPU_STATE_OFFLINE;
 
 static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
 
-static int cede_offline_enabled __read_mostly = 1;
+static bool cede_offline_enabled __read_mostly = true;
 
 /*
  * Enable/disable cede_offline when available.
  */
 static int __init setup_cede_offline(char *str)
 {
-   if (!strcmp(str, "off"))
-   cede_offline_enabled = 0;
-   else if (!strcmp(str, "on"))
-   cede_offline_enabled = 1;
-   else
-   return 0;
-   return 1;
+   return kstrtobool(str, 0, _offline_enabled);
 }
 
 __setup("cede_offline=", setup_cede_offline);
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 99f84ac31307..dff6ce1b84b2 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -1433,7 +1433,7 @@ device_initcall(etr_init_sysfs);
 /*
  * Server Time Protocol (STP) code.
  */
-static int stp_online;
+static bool stp_online;
 static struct stp_sstpi stp_info;
 static void *stp_page;
 
@@ -1444,11 +1444,7 @@ static struct timer_list stp_timer;
 
 static int __init early_parse_stp(char *p)
 {
-   if (strncmp(p, "off", 3) == 0)
-   stp_online = 0;
-   else if (strncmp(p, "on", 2) == 0)
-   stp_online = 1;
-   return 0;
+   return kstrtobool(p, 0, _online);
 }
 early_param("stp", early_parse_stp);
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 40b8102fdadb..5d8a80651f61 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -37,7 +37,7 @@ static void set_topology_timer(void);
 static void topology_work_fn(struct work_struct *work);
 static struct sysinfo_15_1_x *tl_info;
 
-static int topology_enabled = 1;
+static bool topology_enabled = true;
 static DECLARE_WORK(topology_work, topology_work_fn);
 
 /*
@@ -444,10 +444,7 @@ static const struct cpumask *cpu_book_mask(int cpu)
 
 static int __init early_parse_topology(char *p)
 {
-   if (strncmp(p, "off", 3))
-   return 0;
-   topology_enabled = 0;
-   return 0;
+   return kstrtobool(p, 0, _enabled);
 }
 early_param("topology", early_parse_topology);
 
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 6e85f713641d..6b423754083a 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -227,19 +227,11 @@ static u32 __init search_agp_bridge(u32 *order, int 
*valid_agp)
return 0;
 }
 
-static int gart_fix_e820 __initdata = 1;
+static bool gart_fix_e820 __initdata = true;
 
 static int __init parse_gart_mem(char *p)
 {
-   if (!p)
-   return -EINVAL;
-
-   if (!strncmp(p, "off", 3))
-   gart_fix_e820 = 0;

[PATCH v2 3/4] lib: add "on"/"off" support to kstrtobool

2016-02-04 Thread Kees Cook
Add support for "on" and "off" when converting to boolean.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 lib/kstrtox.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index e18f088704d7..09e83a19a96d 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -347,6 +347,20 @@ int kstrtobool(const char *s, unsigned int base, bool *res)
case '0':
*res = false;
return 0;
+   case 'o':
+   case 'O':
+   switch (s[1]) {
+   case 'n':
+   case 'N':
+   *res = true;
+   return 0;
+   case 'f':
+   case 'F':
+   *res = false;
+   return 0;
+   default:
+   break;
+   }
default:
break;
}
-- 
2.6.3



[PATCH v2 2/4] lib: update single-char callers of strtobool

2016-02-04 Thread Kees Cook
Some callers of strtobool were passing a pointer to unterminated strings.
In preparation of adding multi-character processing to kstrtobool, update
the callers to not pass single-character pointers, and switch to using the
new kstrtobool_from_user helper where possible.

Signed-off-by: Kees Cook <keesc...@chromium.org>
Cc: Amitkumar Karwar <akar...@marvell.com>
Cc: Nishant Sarmukadam <nisha...@marvell.com>
Cc: Kalle Valo <kv...@codeaurora.org>
Cc: Steve French <sfre...@samba.org>
Cc: linux-c...@vger.kernel.org
---
 drivers/net/wireless/marvell/mwifiex/debugfs.c | 10 ++---
 fs/cifs/cifs_debug.c   | 58 +++---
 fs/cifs/cifs_debug.h   |  2 +-
 fs/cifs/cifsfs.c   |  6 +--
 fs/cifs/cifsglob.h |  4 +-
 5 files changed, 26 insertions(+), 54 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
b/drivers/net/wireless/marvell/mwifiex/debugfs.c
index 0b9c580af988..bd061b02bc04 100644
--- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
+++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
@@ -880,14 +880,12 @@ mwifiex_reset_write(struct file *file,
 {
struct mwifiex_private *priv = file->private_data;
struct mwifiex_adapter *adapter = priv->adapter;
-   char cmd;
bool result;
+   int rc;
 
-   if (copy_from_user(, ubuf, sizeof(cmd)))
-   return -EFAULT;
-
-   if (strtobool(, ))
-   return -EINVAL;
+   rc = kstrtobool_from_user(ubuf, count, 0, );
+   if (rc)
+   return rc;
 
if (!result)
return -EINVAL;
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index 50b268483302..6ee59abcb69b 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -255,7 +255,6 @@ static const struct file_operations 
cifs_debug_data_proc_fops = {
 static ssize_t cifs_stats_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
bool bv;
int rc;
struct list_head *tmp1, *tmp2, *tmp3;
@@ -263,11 +262,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
struct cifs_ses *ses;
struct cifs_tcon *tcon;
 
-   rc = get_user(c, buffer);
-   if (rc)
-   return rc;
-
-   if (strtobool(, ) == 0) {
+   rc = kstrtobool_from_user(buffer, count, 0, );
+   if (rc == 0) {
 #ifdef CONFIG_CIFS_STATS2
atomic_set(, 0);
atomic_set(, 0);
@@ -290,6 +286,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
}
}
spin_unlock(_tcp_ses_lock);
+   } else {
+   return rc;
}
 
return count;
@@ -433,17 +431,17 @@ static int cifsFYI_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t cifsFYI_proc_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
 {
-   char c;
+   char c[2] = { '\0' };
bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = get_user(c[0], buffer);
if (rc)
return rc;
-   if (strtobool(, ) == 0)
+   if (strtobool(c, ) == 0)
cifsFYI = bv;
-   else if ((c > '1') && (c <= '9'))
-   cifsFYI = (int) (c - '0'); /* see cifs_debug.h for meanings */
+   else if ((c[0] > '1') && (c[0] <= '9'))
+   cifsFYI = (int) (c[0] - '0'); /* see cifs_debug.h for meanings 
*/
 
return count;
 }
@@ -471,20 +469,12 @@ static int cifs_linux_ext_proc_open(struct inode *inode, 
struct file *file)
 static ssize_t cifs_linux_ext_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = kstrtobool_from_user(buffer, count, 0, );
if (rc)
return rc;
 
-   rc = strtobool(, );
-   if (rc)
-   return rc;
-
-   linuxExtEnabled = bv;
-
return count;
 }
 
@@ -511,20 +501,12 @@ static int cifs_lookup_cache_proc_open(struct inode 
*inode, struct file *file)
 static ssize_t cifs_lookup_cache_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = kstrtobool_from_user(buffer, count, 0, );
if (rc)
return rc;
 
-   rc = strtobool(, );
-   if (rc)
-   return rc;
-
-   lookupCacheEnabled = bv;
-
return count;
 }
 
@@ -551,20 +533,12 @@ static int traceSMB_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t traceSMB_proc_write(struct file *file, const char __user 
*buffer,
size_t count, loff_t *ppos)
 {
-   ch

Re: [PATCH v2 1/4] lib: move strtobool to kstrtobool

2016-02-04 Thread Kees Cook
On Thu, Feb 4, 2016 at 2:43 PM, Andy Shevchenko
<andy.shevche...@gmail.com> wrote:
> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook <keesc...@chromium.org> wrote:
>> Create the kstrtobool_from_user helper and moves strtobool logic into
>> the new kstrtobool (matching all the other kstrto* functions). Provides
>> an inline wrapper for existing strtobool callers.
>>
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>
> Reviewed-by: Andy Shevchenko <andy.shevche...@gmail.com>
>
> One minor below.

Thanks!

>
>> ---
>>  include/linux/kernel.h |  3 +++
>>  include/linux/string.h |  6 +-
>>  lib/kstrtox.c  | 35 +++
>>  lib/string.c   | 29 -
>>  4 files changed, 43 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
>> index f31638c6e873..cdc25f47a23f 100644
>> --- a/include/linux/kernel.h
>> +++ b/include/linux/kernel.h
>> @@ -357,6 +357,7 @@ int __must_check kstrtou16(const char *s, unsigned int 
>> base, u16 *res);
>>  int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
>>  int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
>>  int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
>> +int __must_check kstrtobool(const char *s, unsigned int base, bool *res);
>>
>>  int __must_check kstrtoull_from_user(const char __user *s, size_t count, 
>> unsigned int base, unsigned long long *res);
>>  int __must_check kstrtoll_from_user(const char __user *s, size_t count, 
>> unsigned int base, long long *res);
>> @@ -368,6 +369,8 @@ int __must_check kstrtou16_from_user(const char __user 
>> *s, size_t count, unsigne
>>  int __must_check kstrtos16_from_user(const char __user *s, size_t count, 
>> unsigned int base, s16 *res);
>>  int __must_check kstrtou8_from_user(const char __user *s, size_t count, 
>> unsigned int base, u8 *res);
>>  int __must_check kstrtos8_from_user(const char __user *s, size_t count, 
>> unsigned int base, s8 *res);
>
>> +int __must_check kstrtobool_from_user(const char __user *s, size_t count,
>> + unsigned int base, bool *res);
>
> We already are using long lines here, perhaps do the same?

I went back and forth on that, and decided that between checkpatch
yelling at me, and trying to be an agent of less entropy, I wrapped
the definition. I am fine either way, though.

-Kees

>
>>
>>  static inline int __must_check kstrtou64_from_user(const char __user *s, 
>> size_t count, unsigned int base, u64 *res)
>>  {
>> diff --git a/include/linux/string.h b/include/linux/string.h
>> index 9eebc66d957a..d2fb21b1081d 100644
>> --- a/include/linux/string.h
>> +++ b/include/linux/string.h
>> @@ -128,7 +128,11 @@ extern char **argv_split(gfp_t gfp, const char *str, 
>> int *argcp);
>>  extern void argv_free(char **argv);
>>
>>  extern bool sysfs_streq(const char *s1, const char *s2);
>> -extern int strtobool(const char *s, bool *res);
>> +extern int kstrtobool(const char *s, unsigned int base, bool *res);
>> +static inline int strtobool(const char *s, bool *res)
>> +{
>> +   return kstrtobool(s, 0, res);
>> +}
>>
>>  #ifdef CONFIG_BINARY_PRINTF
>>  int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
>> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
>> index 94be244e8441..e18f088704d7 100644
>> --- a/lib/kstrtox.c
>> +++ b/lib/kstrtox.c
>> @@ -321,6 +321,40 @@ int kstrtos8(const char *s, unsigned int base, s8 *res)
>>  }
>>  EXPORT_SYMBOL(kstrtos8);
>>
>> +/**
>> + * kstrtobool - convert common user inputs into boolean values
>> + * @s: input string
>> + * @base: ignored
>> + * @res: result
>> + *
>> + * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
>> + * Otherwise it will return -EINVAL.  Value pointed to by res is
>> + * updated upon finding a match.
>> + */
>> +int kstrtobool(const char *s, unsigned int base, bool *res)
>> +{
>> +   if (!s)
>> +   return -EINVAL;
>> +
>> +   switch (s[0]) {
>> +   case 'y':
>> +   case 'Y':
>> +   case '1':
>> +   *res = true;
>> +   return 0;
>> +   case 'n':
>> +   case 'N':
>> +   case '0':
>> +   *res = false;
>> +   return 0;
>> +   default:
>> +   break;
>> +   }
>> +
>

[PATCH v2 0/4] lib: add "on" and "off" to strtobool

2016-02-04 Thread Kees Cook
This consolidates logic for handling "on"/"off" parsing for bools into
the strtobool function, by way of moving it into kstrtobool (with helpers),
and updating various callers.

 arch/powerpc/kernel/rtasd.c|9 ---
 arch/powerpc/platforms/pseries/hotplug-cpu.c   |   10 
 arch/s390/kernel/time.c|8 ---
 arch/s390/kernel/topology.c|7 ---
 arch/x86/kernel/aperture_64.c  |   12 -
 drivers/net/wireless/marvell/mwifiex/debugfs.c |   10 +---
 fs/cifs/cifs_debug.c   |   58 ++---
 fs/cifs/cifs_debug.h   |2 
 fs/cifs/cifsfs.c   |6 +-
 fs/cifs/cifsglob.h |4 -
 include/linux/kernel.h |3 +
 include/linux/string.h |6 ++
 include/linux/tick.h   |2 
 kernel/time/hrtimer.c  |   10 
 kernel/time/tick-sched.c   |   10 
 lib/kstrtox.c  |   49 +
 lib/string.c   |   29 
 17 files changed, 98 insertions(+), 137 deletions(-)

-Kees



Re: [PATCH v2 2/4] lib: update single-char callers of strtobool

2016-02-04 Thread Kees Cook
On Thu, Feb 4, 2016 at 2:59 PM, Andy Shevchenko
<andy.shevche...@gmail.com> wrote:
> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook <keesc...@chromium.org> wrote:
>> Some callers of strtobool were passing a pointer to unterminated strings.
>> In preparation of adding multi-character processing to kstrtobool, update
>> the callers to not pass single-character pointers, and switch to using the
>> new kstrtobool_from_user helper where possible.
>
> Looks much better now!
> My comment below.
>
>>
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>> Cc: Amitkumar Karwar <akar...@marvell.com>
>> Cc: Nishant Sarmukadam <nisha...@marvell.com>
>> Cc: Kalle Valo <kv...@codeaurora.org>
>> Cc: Steve French <sfre...@samba.org>
>> Cc: linux-c...@vger.kernel.org
>> ---
>>  drivers/net/wireless/marvell/mwifiex/debugfs.c | 10 ++---
>>  fs/cifs/cifs_debug.c   | 58 
>> +++---
>>  fs/cifs/cifs_debug.h   |  2 +-
>>  fs/cifs/cifsfs.c   |  6 +--
>>  fs/cifs/cifsglob.h |  4 +-
>>  5 files changed, 26 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
>> b/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> index 0b9c580af988..bd061b02bc04 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> @@ -880,14 +880,12 @@ mwifiex_reset_write(struct file *file,
>>  {
>> struct mwifiex_private *priv = file->private_data;
>> struct mwifiex_adapter *adapter = priv->adapter;
>> -   char cmd;
>> bool result;
>> +   int rc;
>>
>> -   if (copy_from_user(, ubuf, sizeof(cmd)))
>> -   return -EFAULT;
>> -
>> -   if (strtobool(, ))
>> -   return -EINVAL;
>> +   rc = kstrtobool_from_user(ubuf, count, 0, );
>> +   if (rc)
>> +   return rc;
>>
>> if (!result)
>> return -EINVAL;
>> diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
>> index 50b268483302..6ee59abcb69b 100644
>> --- a/fs/cifs/cifs_debug.c
>> +++ b/fs/cifs/cifs_debug.c
>> @@ -255,7 +255,6 @@ static const struct file_operations 
>> cifs_debug_data_proc_fops = {
>>  static ssize_t cifs_stats_proc_write(struct file *file,
>> const char __user *buffer, size_t count, loff_t *ppos)
>>  {
>> -   char c;
>> bool bv;
>> int rc;
>> struct list_head *tmp1, *tmp2, *tmp3;
>> @@ -263,11 +262,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
>> struct cifs_ses *ses;
>> struct cifs_tcon *tcon;
>>
>> -   rc = get_user(c, buffer);
>> -   if (rc)
>> -   return rc;
>> -
>> -   if (strtobool(, ) == 0) {
>> +   rc = kstrtobool_from_user(buffer, count, 0, );
>> +   if (rc == 0) {
>>  #ifdef CONFIG_CIFS_STATS2
>> atomic_set(, 0);
>> atomic_set(, 0);
>> @@ -290,6 +286,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
>> }
>> }
>> spin_unlock(_tcp_ses_lock);
>> +   } else {
>> +   return rc;
>> }
>>
>> return count;
>> @@ -433,17 +431,17 @@ static int cifsFYI_proc_open(struct inode *inode, 
>> struct file *file)
>>  static ssize_t cifsFYI_proc_write(struct file *file, const char __user 
>> *buffer,
>> size_t count, loff_t *ppos)
>>  {
>> -   char c;
>> +   char c[2] = { '\0' };
>> bool bv;
>> int rc;
>>
>> -   rc = get_user(c, buffer);
>> +   rc = get_user(c[0], buffer);
>
>> if (rc)
>> return rc;
>> -   if (strtobool(, ) == 0)
>> +   if (strtobool(c, ) == 0)
>> cifsFYI = bv;
>> -   else if ((c > '1') && (c <= '9'))
>> -   cifsFYI = (int) (c - '0'); /* see cifs_debug.h for meanings 
>> */
>> +   else if ((c[0] > '1') && (c[0] <= '9'))
>> +   cifsFYI = (int) (c[0] - '0'); /* see cifs_debug.h for 
>> meanings */
>>
>> return count;
>>  }
>> @@ -471,20 +469,12 @@ static int cifs_linux_ext_proc_open(struct inode 
>> *inode, struct file *file)
>>  static ssize_t cifs_lin

Re: [PATCH v2 3/4] lib: add "on"/"off" support to kstrtobool

2016-02-04 Thread Kees Cook
On Thu, Feb 4, 2016 at 3:00 PM, Andy Shevchenko
<andy.shevche...@gmail.com> wrote:
> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook <keesc...@chromium.org> wrote:
>> Add support for "on" and "off" when converting to boolean.
>>
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>> ---
>>  lib/kstrtox.c | 14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
>> index e18f088704d7..09e83a19a96d 100644
>> --- a/lib/kstrtox.c
>> +++ b/lib/kstrtox.c
>> @@ -347,6 +347,20 @@ int kstrtobool(const char *s, unsigned int base, bool 
>> *res)
>
> Forgot update description?

Argh, thank you. Good eye. Sent another update.

-Kees

>
>> case '0':
>> *res = false;
>> return 0;
>> +   case 'o':
>> +   case 'O':
>> +   switch (s[1]) {
>> +   case 'n':
>> +   case 'N':
>> +   *res = true;
>> +   return 0;
>> +   case 'f':
>> +   case 'F':
>> +   *res = false;
>> +   return 0;
>> +   default:
>> +   break;
>> +   }
>> default:
>> break;
>> }
>> --
>> 2.6.3
>>
>
>
>
> --
> With Best Regards,
> Andy Shevchenko



-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH v2 4/4] param: convert some "on"/"off" users to strtobool

2016-02-04 Thread Kees Cook
On Thu, Feb 4, 2016 at 4:11 PM, Kees Cook <keesc...@chromium.org> wrote:
> On Thu, Feb 4, 2016 at 3:04 PM, Andy Shevchenko
> <andy.shevche...@gmail.com> wrote:
>> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook <keesc...@chromium.org> wrote:
>>> This changes several users of manual "on"/"off" parsing to use strtobool.
>>> (Which means they will now parse y/n/1/0 meaningfully too.)
>>>
>>
>> I like this change, but can you carefully check the acceptance of the
>> returned value?
>> Briefly I saw 1 or 0 as okay in different places.
>
> Maybe I missed something, but I think this is actually a bug fix. The
> two cases are early_param and __setup:
>
> For early_param, the functions are called when walking the command
> line in do_early_param via parse_args in parse_early_options. Any
> non-zero return values produce a warning (in do_early_param not
> parse_args). So this is a bug fix, since the function I touched would
> (almost) always return 0, even with bad values (i.e. fixes unreported
> bad arguments):
>
> early_param early_parse_stp always 0
> early_param early_parse_topology always 0
> early_param parse_gart_mem always 0 unless !p (then -EINVAL)
>
> For __setup, these are handled by obsolete_checksetup via
> unknown_bootoption via parse_args in start_kernel, as a way to merge
> __setup calls that should really be in param (i.e. non-early __setup).
> Return values are bubbled up into parse_args and hit:
>
> default:
> pr_err("%s: `%s' invalid for parameter `%s'\n",
>doing, val ?: "", param);
> break;
>
> So this is also a bug fix, since these __setup functions returned inverted
> values or always failed:
>
> __setup rtasmsgs_setup always 1
> __setup setup_cede_offline 1 on success, otherwise 0
> __setup setup_hrtimer_hres 1 on success, otherwise 0
> __setup setup_tick_nohz 1 on success, otherwise 0
>
> So if you specified any of these, they would trigger a bogus "invalid
> parameter" report.
>
> I will double-check...

I am wrong! __setup functions (as handled by unknown_bootoption) need
to return 1, or they end up in the init environment. I will send a
fix...

-Kees

>
> -Kees
>
>>
>>
>>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>>> Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
>>> Acked-by: Michael Ellerman <m...@ellerman.id.au>
>>> Cc: x...@kernel.org
>>> Cc: linuxppc-...@lists.ozlabs.org
>>> Cc: linux-s...@vger.kernel.org
>>> ---
>>>  arch/powerpc/kernel/rtasd.c  |  9 ++---
>>>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++
>>>  arch/s390/kernel/time.c  |  8 ++--
>>>  arch/s390/kernel/topology.c  |  7 ++-
>>>  arch/x86/kernel/aperture_64.c| 12 ++--
>>>  include/linux/tick.h |  2 +-
>>>  kernel/time/hrtimer.c| 10 ++
>>>  kernel/time/tick-sched.c | 10 ++
>>>  8 files changed, 15 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>>> index 5a2c049c1c61..567ed5a2f43a 100644
>>> --- a/arch/powerpc/kernel/rtasd.c
>>> +++ b/arch/powerpc/kernel/rtasd.c
>>> @@ -49,7 +49,7 @@ static unsigned int rtas_error_log_buffer_max;
>>>  static unsigned int event_scan;
>>>  static unsigned int rtas_event_scan_rate;
>>>
>>> -static int full_rtas_msgs = 0;
>>> +static bool full_rtas_msgs;
>>>
>>>  /* Stop logging to nvram after first fatal error */
>>>  static int logging_enabled; /* Until we initialize everything,
>>> @@ -592,11 +592,6 @@ __setup("surveillance=", surveillance_setup);
>>>
>>>  static int __init rtasmsgs_setup(char *str)
>>>  {
>>> -   if (strcmp(str, "on") == 0)
>>> -   full_rtas_msgs = 1;
>>> -   else if (strcmp(str, "off") == 0)
>>> -   full_rtas_msgs = 0;
>>> -
>>> -   return 1;
>>> +   return kstrtobool(str, 0, _rtas_msgs);
>>>  }
>>>  __setup("rtasmsgs=", rtasmsgs_setup);
>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>>> index 32274f72fe3f..b9787cae4108 100644
>>> --- a/arch/powerpc/platforms/pseries/ho

Re: [PATCH v2 4/4] param: convert some "on"/"off" users to strtobool

2016-02-04 Thread Kees Cook
On Thu, Feb 4, 2016 at 3:04 PM, Andy Shevchenko
<andy.shevche...@gmail.com> wrote:
> On Thu, Feb 4, 2016 at 11:00 PM, Kees Cook <keesc...@chromium.org> wrote:
>> This changes several users of manual "on"/"off" parsing to use strtobool.
>> (Which means they will now parse y/n/1/0 meaningfully too.)
>>
>
> I like this change, but can you carefully check the acceptance of the
> returned value?
> Briefly I saw 1 or 0 as okay in different places.

Maybe I missed something, but I think this is actually a bug fix. The
two cases are early_param and __setup:

For early_param, the functions are called when walking the command
line in do_early_param via parse_args in parse_early_options. Any
non-zero return values produce a warning (in do_early_param not
parse_args). So this is a bug fix, since the function I touched would
(almost) always return 0, even with bad values (i.e. fixes unreported
bad arguments):

early_param early_parse_stp always 0
early_param early_parse_topology always 0
early_param parse_gart_mem always 0 unless !p (then -EINVAL)

For __setup, these are handled by obsolete_checksetup via
unknown_bootoption via parse_args in start_kernel, as a way to merge
__setup calls that should really be in param (i.e. non-early __setup).
Return values are bubbled up into parse_args and hit:

default:
pr_err("%s: `%s' invalid for parameter `%s'\n",
   doing, val ?: "", param);
break;

So this is also a bug fix, since these __setup functions returned inverted
values or always failed:

__setup rtasmsgs_setup always 1
__setup setup_cede_offline 1 on success, otherwise 0
__setup setup_hrtimer_hres 1 on success, otherwise 0
__setup setup_tick_nohz 1 on success, otherwise 0

So if you specified any of these, they would trigger a bogus "invalid
parameter" report.

I will double-check...

-Kees

>
>
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>> Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
>> Acked-by: Michael Ellerman <m...@ellerman.id.au>
>> Cc: x...@kernel.org
>> Cc: linuxppc-...@lists.ozlabs.org
>> Cc: linux-s...@vger.kernel.org
>> ---
>>  arch/powerpc/kernel/rtasd.c  |  9 ++---
>>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++
>>  arch/s390/kernel/time.c  |  8 ++--
>>  arch/s390/kernel/topology.c  |  7 ++-
>>  arch/x86/kernel/aperture_64.c| 12 ++--
>>  include/linux/tick.h |  2 +-
>>  kernel/time/hrtimer.c| 10 ++
>>  kernel/time/tick-sched.c | 10 ++
>>  8 files changed, 15 insertions(+), 53 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>> index 5a2c049c1c61..567ed5a2f43a 100644
>> --- a/arch/powerpc/kernel/rtasd.c
>> +++ b/arch/powerpc/kernel/rtasd.c
>> @@ -49,7 +49,7 @@ static unsigned int rtas_error_log_buffer_max;
>>  static unsigned int event_scan;
>>  static unsigned int rtas_event_scan_rate;
>>
>> -static int full_rtas_msgs = 0;
>> +static bool full_rtas_msgs;
>>
>>  /* Stop logging to nvram after first fatal error */
>>  static int logging_enabled; /* Until we initialize everything,
>> @@ -592,11 +592,6 @@ __setup("surveillance=", surveillance_setup);
>>
>>  static int __init rtasmsgs_setup(char *str)
>>  {
>> -   if (strcmp(str, "on") == 0)
>> -   full_rtas_msgs = 1;
>> -   else if (strcmp(str, "off") == 0)
>> -   full_rtas_msgs = 0;
>> -
>> -   return 1;
>> +   return kstrtobool(str, 0, _rtas_msgs);
>>  }
>>  __setup("rtasmsgs=", rtasmsgs_setup);
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 32274f72fe3f..b9787cae4108 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -47,20 +47,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, 
>> current_state) = CPU_STATE_OFFLINE;
>>
>>  static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
>>
>> -static int cede_offline_enabled __read_mostly = 1;
>> +static bool cede_offline_enabled __read_mostly = true;
>>
>>  /*
>>   * Enable/disable cede_offline when available.
>>   */
>>  static int __init setup_cede_offline(char *str)
>>  {
>> -   if (!strcmp(str, "off"))
>> -   cede_offline_enabled = 

Re: [PATCH v2 1/4] lib: move strtobool to kstrtobool

2016-02-05 Thread Kees Cook
On Thu, Feb 4, 2016 at 3:55 PM, Rasmus Villemoes
<li...@rasmusvillemoes.dk> wrote:
> On Thu, Feb 04 2016, Kees Cook <keesc...@chromium.org> wrote:
>
>> Create the kstrtobool_from_user helper and moves strtobool logic into
>> the new kstrtobool (matching all the other kstrto* functions). Provides
>> an inline wrapper for existing strtobool callers.
>>
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>> ---
>>  include/linux/kernel.h |  3 +++
>>  include/linux/string.h |  6 +-
>>  lib/kstrtox.c  | 35 +++
>>  lib/string.c   | 29 -
>>  4 files changed, 43 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
>> index f31638c6e873..cdc25f47a23f 100644
>> --- a/include/linux/kernel.h
>> +++ b/include/linux/kernel.h
>> @@ -357,6 +357,7 @@ int __must_check kstrtou16(const char *s, unsigned int 
>> base, u16 *res);
>>  int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
>>  int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
>>  int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
>> +int __must_check kstrtobool(const char *s, unsigned int base, bool *res);
>>
>>  int __must_check kstrtoull_from_user(const char __user *s, size_t count, 
>> unsigned int base, unsigned long long *res);
>>  int __must_check kstrtoll_from_user(const char __user *s, size_t count, 
>> unsigned int base, long long *res);
>> @@ -368,6 +369,8 @@ int __must_check kstrtou16_from_user(const char __user 
>> *s, size_t count, unsigne
>>  int __must_check kstrtos16_from_user(const char __user *s, size_t count, 
>> unsigned int base, s16 *res);
>>  int __must_check kstrtou8_from_user(const char __user *s, size_t count, 
>> unsigned int base, u8 *res);
>>  int __must_check kstrtos8_from_user(const char __user *s, size_t count, 
>> unsigned int base, s8 *res);
>> +int __must_check kstrtobool_from_user(const char __user *s, size_t count,
>> +   unsigned int base, bool *res);
>>
>>  static inline int __must_check kstrtou64_from_user(const char __user *s, 
>> size_t count, unsigned int base, u64 *res)
>>  {
>> diff --git a/include/linux/string.h b/include/linux/string.h
>> index 9eebc66d957a..d2fb21b1081d 100644
>> --- a/include/linux/string.h
>> +++ b/include/linux/string.h
>> @@ -128,7 +128,11 @@ extern char **argv_split(gfp_t gfp, const char *str, 
>> int *argcp);
>>  extern void argv_free(char **argv);
>>
>>  extern bool sysfs_streq(const char *s1, const char *s2);
>> -extern int strtobool(const char *s, bool *res);
>> +extern int kstrtobool(const char *s, unsigned int base, bool *res);
>> +static inline int strtobool(const char *s, bool *res)
>> +{
>> + return kstrtobool(s, 0, res);
>> +}
>>
>>  #ifdef CONFIG_BINARY_PRINTF
>>  int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
>> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
>> index 94be244e8441..e18f088704d7 100644
>> --- a/lib/kstrtox.c
>> +++ b/lib/kstrtox.c
>> @@ -321,6 +321,40 @@ int kstrtos8(const char *s, unsigned int base, s8 *res)
>>  }
>>  EXPORT_SYMBOL(kstrtos8);
>>
>> +/**
>> + * kstrtobool - convert common user inputs into boolean values
>> + * @s: input string
>> + * @base: ignored
>> + * @res: result
>> + *
>> + * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
>> + * Otherwise it will return -EINVAL.  Value pointed to by res is
>> + * updated upon finding a match.
>> + */
>> +int kstrtobool(const char *s, unsigned int base, bool *res)
>> +{
>
> Being able to create the kstrtobool_from_user with a single macro
> invocation is convenient, but I don't think that justifies the ugliness
> of having an unused parameter. People reading this code or trying to use
> the interface will wonder what it's doing there, and it will generate
> slightly larger code for all the users of strtobool.
>
> So I'd just make a separate explicit definition of kstrtobool_from_user
> (the stack buffer sizing doesn't apply to the strings we want to parse
> anyway, though 11 is of course plenty).

Okay, thanks. So many things were bothering me, but I feared code
duplication would be seen as worse. I'm much happier to drop the
unused argument. :)

I'll send a v3 with all the changes.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


[PATCH v3 1/4] lib: move strtobool to kstrtobool

2016-02-05 Thread Kees Cook
Create the kstrtobool_from_user helper and moves strtobool logic into
the new kstrtobool (matching all the other kstrto* functions). Provides
an inline wrapper for existing strtobool callers.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
v3:
- drop needless "base" argument, rasmus
---
 include/linux/kernel.h |  2 ++
 include/linux/string.h |  6 +-
 lib/kstrtox.c  | 50 ++
 lib/string.c   | 29 -
 4 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index f31638c6e873..f4fa2b29c38c 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -357,6 +357,7 @@ int __must_check kstrtou16(const char *s, unsigned int 
base, u16 *res);
 int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
 int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
 int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
+int __must_check kstrtobool(const char *s, bool *res);
 
 int __must_check kstrtoull_from_user(const char __user *s, size_t count, 
unsigned int base, unsigned long long *res);
 int __must_check kstrtoll_from_user(const char __user *s, size_t count, 
unsigned int base, long long *res);
@@ -368,6 +369,7 @@ int __must_check kstrtou16_from_user(const char __user *s, 
size_t count, unsigne
 int __must_check kstrtos16_from_user(const char __user *s, size_t count, 
unsigned int base, s16 *res);
 int __must_check kstrtou8_from_user(const char __user *s, size_t count, 
unsigned int base, u8 *res);
 int __must_check kstrtos8_from_user(const char __user *s, size_t count, 
unsigned int base, s8 *res);
+int __must_check kstrtobool_from_user(const char __user *s, size_t count, bool 
*res);
 
 static inline int __must_check kstrtou64_from_user(const char __user *s, 
size_t count, unsigned int base, u64 *res)
 {
diff --git a/include/linux/string.h b/include/linux/string.h
index 9eebc66d957a..2217224684c9 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -128,7 +128,11 @@ extern char **argv_split(gfp_t gfp, const char *str, int 
*argcp);
 extern void argv_free(char **argv);
 
 extern bool sysfs_streq(const char *s1, const char *s2);
-extern int strtobool(const char *s, bool *res);
+extern int kstrtobool(const char *s, bool *res);
+static inline int strtobool(const char *s, bool *res)
+{
+   return kstrtobool(s, res);
+}
 
 #ifdef CONFIG_BINARY_PRINTF
 int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index 94be244e8441..e8ba4a013e82 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -321,6 +321,56 @@ int kstrtos8(const char *s, unsigned int base, s8 *res)
 }
 EXPORT_SYMBOL(kstrtos8);
 
+/**
+ * kstrtobool - convert common user inputs into boolean values
+ * @s: input string
+ * @res: result
+ *
+ * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
+ * Otherwise it will return -EINVAL.  Value pointed to by res is
+ * updated upon finding a match.
+ */
+int kstrtobool(const char *s, bool *res)
+{
+   if (!s)
+   return -EINVAL;
+
+   switch (s[0]) {
+   case 'y':
+   case 'Y':
+   case '1':
+   *res = true;
+   return 0;
+   case 'n':
+   case 'N':
+   case '0':
+   *res = false;
+   return 0;
+   default:
+   break;
+   }
+
+   return -EINVAL;
+}
+EXPORT_SYMBOL(kstrtobool);
+
+/*
+ * Since "base" would be a nonsense argument, this open-codes the
+ * _from_user helper instead of using the helper macro below.
+ */
+int kstrtobool_from_user(const char __user *s, size_t count, bool *res)
+{
+   /* Longest string needed to differentiate, newline, terminator */
+   char buf[4];
+
+   count = min(count, sizeof(buf) - 1);
+   if (copy_from_user(buf, s, count))
+   return -EFAULT;
+   buf[count] = '\0';
+   return kstrtobool(buf, res);
+}
+EXPORT_SYMBOL(kstrtobool_from_user);
+
 #define kstrto_from_user(f, g, type)   \
 int f(const char __user *s, size_t count, unsigned int base, type *res)
\
 {  \
diff --git a/lib/string.c b/lib/string.c
index 0323c0d5629a..1a90db9bc6e1 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -630,35 +630,6 @@ bool sysfs_streq(const char *s1, const char *s2)
 }
 EXPORT_SYMBOL(sysfs_streq);
 
-/**
- * strtobool - convert common user inputs into boolean values
- * @s: input string
- * @res: result
- *
- * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
- * Otherwise it will return -EINVAL.  Value pointed to by res is
- * updated upon finding a match.
- */
-int strtobool(const char *s, bool *res)
-{
-   switch (s[0]) {
-   case 'y':
-   case 'Y':
-   case '1':
-   *res = t

[PATCH v3 4/4] param: convert some "on"/"off" users to strtobool

2016-02-05 Thread Kees Cook
This changes several users of manual "on"/"off" parsing to use strtobool.

Some side-effects:
- these uses will now parse y/n/1/0 meaningfully too
- the early_param uses will now bubble up parse errors

Signed-off-by: Kees Cook <keesc...@chromium.org>
Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
Acked-by: Michael Ellerman <m...@ellerman.id.au>
Cc: x...@kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
---
v3:
- retain __setup return values, andy.shevchenko
- remove unused "base" argument
---
 arch/powerpc/kernel/rtasd.c  |  7 ++-
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++
 arch/s390/kernel/time.c  |  8 ++--
 arch/s390/kernel/topology.c  |  7 ++-
 arch/x86/kernel/aperture_64.c| 12 ++--
 include/linux/tick.h |  2 +-
 kernel/time/hrtimer.c| 10 ++
 kernel/time/tick-sched.c | 10 ++
 8 files changed, 15 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 5a2c049c1c61..0ae5cb84d4e2 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -49,7 +49,7 @@ static unsigned int rtas_error_log_buffer_max;
 static unsigned int event_scan;
 static unsigned int rtas_event_scan_rate;
 
-static int full_rtas_msgs = 0;
+static bool full_rtas_msgs;
 
 /* Stop logging to nvram after first fatal error */
 static int logging_enabled; /* Until we initialize everything,
@@ -592,10 +592,7 @@ __setup("surveillance=", surveillance_setup);
 
 static int __init rtasmsgs_setup(char *str)
 {
-   if (strcmp(str, "on") == 0)
-   full_rtas_msgs = 1;
-   else if (strcmp(str, "off") == 0)
-   full_rtas_msgs = 0;
+   kstrtobool(str, _rtas_msgs);
 
return 1;
 }
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 32274f72fe3f..282837a1d74b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -47,20 +47,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = 
CPU_STATE_OFFLINE;
 
 static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
 
-static int cede_offline_enabled __read_mostly = 1;
+static bool cede_offline_enabled __read_mostly = true;
 
 /*
  * Enable/disable cede_offline when available.
  */
 static int __init setup_cede_offline(char *str)
 {
-   if (!strcmp(str, "off"))
-   cede_offline_enabled = 0;
-   else if (!strcmp(str, "on"))
-   cede_offline_enabled = 1;
-   else
-   return 0;
-   return 1;
+   return (kstrtobool(str, _offline_enabled) == 0);
 }
 
 __setup("cede_offline=", setup_cede_offline);
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 99f84ac31307..580bc7299ec3 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -1433,7 +1433,7 @@ device_initcall(etr_init_sysfs);
 /*
  * Server Time Protocol (STP) code.
  */
-static int stp_online;
+static bool stp_online;
 static struct stp_sstpi stp_info;
 static void *stp_page;
 
@@ -1444,11 +1444,7 @@ static struct timer_list stp_timer;
 
 static int __init early_parse_stp(char *p)
 {
-   if (strncmp(p, "off", 3) == 0)
-   stp_online = 0;
-   else if (strncmp(p, "on", 2) == 0)
-   stp_online = 1;
-   return 0;
+   return kstrtobool(p, _online);
 }
 early_param("stp", early_parse_stp);
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 40b8102fdadb..64298a867589 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -37,7 +37,7 @@ static void set_topology_timer(void);
 static void topology_work_fn(struct work_struct *work);
 static struct sysinfo_15_1_x *tl_info;
 
-static int topology_enabled = 1;
+static bool topology_enabled = true;
 static DECLARE_WORK(topology_work, topology_work_fn);
 
 /*
@@ -444,10 +444,7 @@ static const struct cpumask *cpu_book_mask(int cpu)
 
 static int __init early_parse_topology(char *p)
 {
-   if (strncmp(p, "off", 3))
-   return 0;
-   topology_enabled = 0;
-   return 0;
+   return kstrtobool(p, _enabled);
 }
 early_param("topology", early_parse_topology);
 
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 6e85f713641d..0a2bb1f62e72 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -227,19 +227,11 @@ static u32 __init search_agp_bridge(u32 *order, int 
*valid_agp)
return 0;
 }
 
-static int gart_fix_e820 __initdata = 1;
+static bool gart_fix_e820 __initdata = true;
 
 static int __init parse_gart_mem(char *p)
 

[PATCH v3 2/4] lib: update single-char callers of strtobool

2016-02-05 Thread Kees Cook
Some callers of strtobool were passing a pointer to unterminated strings.
In preparation of adding multi-character processing to kstrtobool, update
the callers to not pass single-character pointers, and switch to using the
new kstrtobool_from_user helper where possible.

Signed-off-by: Kees Cook <keesc...@chromium.org>
Cc: Amitkumar Karwar <akar...@marvell.com>
Cc: Nishant Sarmukadam <nisha...@marvell.com>
Cc: Kalle Valo <kv...@codeaurora.org>
Cc: Steve French <sfre...@samba.org>
Cc: linux-c...@vger.kernel.org
---
v3:
- drop needless buffer, andy.shevchenko
- drop unused "base" argument
---
 drivers/net/wireless/marvell/mwifiex/debugfs.c | 10 ++---
 fs/cifs/cifs_debug.c   | 56 +++---
 fs/cifs/cifs_debug.h   |  2 +-
 fs/cifs/cifsfs.c   |  6 +--
 fs/cifs/cifsglob.h |  4 +-
 5 files changed, 24 insertions(+), 54 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
b/drivers/net/wireless/marvell/mwifiex/debugfs.c
index 0b9c580af988..2eff989c6d9f 100644
--- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
+++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
@@ -880,14 +880,12 @@ mwifiex_reset_write(struct file *file,
 {
struct mwifiex_private *priv = file->private_data;
struct mwifiex_adapter *adapter = priv->adapter;
-   char cmd;
bool result;
+   int rc;
 
-   if (copy_from_user(, ubuf, sizeof(cmd)))
-   return -EFAULT;
-
-   if (strtobool(, ))
-   return -EINVAL;
+   rc = kstrtobool_from_user(ubuf, count, );
+   if (rc)
+   return rc;
 
if (!result)
return -EINVAL;
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index 50b268483302..788e19195991 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -255,7 +255,6 @@ static const struct file_operations 
cifs_debug_data_proc_fops = {
 static ssize_t cifs_stats_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
bool bv;
int rc;
struct list_head *tmp1, *tmp2, *tmp3;
@@ -263,11 +262,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
struct cifs_ses *ses;
struct cifs_tcon *tcon;
 
-   rc = get_user(c, buffer);
-   if (rc)
-   return rc;
-
-   if (strtobool(, ) == 0) {
+   rc = kstrtobool_from_user(buffer, count, );
+   if (rc == 0) {
 #ifdef CONFIG_CIFS_STATS2
atomic_set(, 0);
atomic_set(, 0);
@@ -290,6 +286,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
}
}
spin_unlock(_tcp_ses_lock);
+   } else {
+   return rc;
}
 
return count;
@@ -433,17 +431,17 @@ static int cifsFYI_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t cifsFYI_proc_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
 {
-   char c;
+   char c[2] = { '\0' };
bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = get_user(c[0], buffer);
if (rc)
return rc;
-   if (strtobool(, ) == 0)
+   if (strtobool(c, ) == 0)
cifsFYI = bv;
-   else if ((c > '1') && (c <= '9'))
-   cifsFYI = (int) (c - '0'); /* see cifs_debug.h for meanings */
+   else if ((c[0] > '1') && (c[0] <= '9'))
+   cifsFYI = (int) (c[0] - '0'); /* see cifs_debug.h for meanings 
*/
 
return count;
 }
@@ -471,20 +469,12 @@ static int cifs_linux_ext_proc_open(struct inode *inode, 
struct file *file)
 static ssize_t cifs_linux_ext_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = kstrtobool_from_user(buffer, count, );
if (rc)
return rc;
 
-   rc = strtobool(, );
-   if (rc)
-   return rc;
-
-   linuxExtEnabled = bv;
-
return count;
 }
 
@@ -511,20 +501,12 @@ static int cifs_lookup_cache_proc_open(struct inode 
*inode, struct file *file)
 static ssize_t cifs_lookup_cache_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = kstrtobool_from_user(buffer, count, );
if (rc)
return rc;
 
-   rc = strtobool(, );
-   if (rc)
-   return rc;
-
-   lookupCacheEnabled = bv;
-
return count;
 }
 
@@ -551,20 +533,12 @@ static int traceSMB_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t traceSMB_proc_write(struct file *fi

[PATCH v3 0/4] lib: add "on" and "off" to strtobool

2016-02-05 Thread Kees Cook
This consolidates logic for handling "on"/"off" parsing for bools into the
strtobool function, by way of moving it into kstrtobool (with helpers),
and updating various callers.

v3:
- removed unused "base" argument
- fixed missing description change
- retained inverted __setup return values
- removed needless extra buffer in cifs

v2:
- moved to kstroto* style

 arch/powerpc/kernel/rtasd.c|7 --
 arch/powerpc/platforms/pseries/hotplug-cpu.c   |   10 ---
 arch/s390/kernel/time.c|8 ---
 arch/s390/kernel/topology.c|7 --
 arch/x86/kernel/aperture_64.c  |   12 
 drivers/net/wireless/marvell/mwifiex/debugfs.c |   10 +--
 fs/cifs/cifs_debug.c   |   56 +
 fs/cifs/cifs_debug.h   |2 
 fs/cifs/cifsfs.c   |6 +-
 fs/cifs/cifsglob.h |4 -
 include/linux/kernel.h |2 
 include/linux/string.h |6 +-
 include/linux/tick.h   |2 
 kernel/time/hrtimer.c  |   10 ---
 kernel/time/tick-sched.c   |   10 ---
 lib/kstrtox.c  |   64 +
 lib/string.c   |   29 ---
 17 files changed, 110 insertions(+), 135 deletions(-)

-Kees



[PATCH v3 3/4] lib: add "on"/"off" support to kstrtobool

2016-02-05 Thread Kees Cook
Add support for "on" and "off" when converting to boolean.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
v3:
- add dropped descripion change, andy.shevchenko
---
 lib/kstrtox.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/lib/kstrtox.c b/lib/kstrtox.c
index e8ba4a013e82..d8a5cf66c316 100644
--- a/lib/kstrtox.c
+++ b/lib/kstrtox.c
@@ -326,9 +326,9 @@ EXPORT_SYMBOL(kstrtos8);
  * @s: input string
  * @res: result
  *
- * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
- * Otherwise it will return -EINVAL.  Value pointed to by res is
- * updated upon finding a match.
+ * This routine returns 0 iff the first character is one of 'Yy1Nn0', or
+ * [oO][NnFf] for "on" and "off". Otherwise it will return -EINVAL.  Value
+ * pointed to by res is updated upon finding a match.
  */
 int kstrtobool(const char *s, bool *res)
 {
@@ -346,6 +346,20 @@ int kstrtobool(const char *s, bool *res)
case '0':
*res = false;
return 0;
+   case 'o':
+   case 'O':
+   switch (s[1]) {
+   case 'n':
+   case 'N':
+   *res = true;
+   return 0;
+   case 'f':
+   case 'F':
+   *res = false;
+   return 0;
+   default:
+   break;
+   }
default:
break;
}
-- 
2.6.3



Re: [PATCH v2 2/4] lib: update single-char callers of strtobool

2016-02-05 Thread Kees Cook
On Fri, Feb 5, 2016 at 2:46 AM, David Laight <david.lai...@aculab.com> wrote:
> From: Kees Cook
>> Sent: 04 February 2016 21:01
>> Some callers of strtobool were passing a pointer to unterminated strings.
>> In preparation of adding multi-character processing to kstrtobool, update
>> the callers to not pass single-character pointers, and switch to using the
>> new kstrtobool_from_user helper where possible.
>
> Personally I think you should change the name of the function so that the
> compiler (and linker) will pick up places that have not been changed.
> Relying on people to make the required changes will cause problems.

After the single-character users were pointed out, I looked for others
and there aren't any.

> The current code (presumably) treats "no", "nyet" and "nkjkkrkjrkjterkj" as 
> false.
> Changing that behaviour will break things.

There's no change there. All three of those will still be "false".
Perhaps my changelog shouldn't say "unterminated" but rather
"character array".

> If you want to support "on" and "off", then maybe check for the supplied 
> string
> starting with the character sequences "on\0" and "off\0" (as well as any 
> others).
> This doesn't need the input string be '\0' terminated - since you match y and 
> n
> without looking at the 2nd byte.
> You'd have to be extremely unlucky to get a page fault in the 3 bytes
> following an 'o' if the caller supplied a single byte buffer.

I'd prefer to keep the switch statement as short as possible, and I
don't want to do full string compares. And as you say, even fixing the
single-byte callers seems like a needless exercise, but seeing as how
it's a net clean-up, I think it's good they way I've got the series.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH 1/3] lib: fix callers of strtobool to use char array

2016-02-04 Thread Kees Cook
On Mon, Feb 1, 2016 at 5:17 AM, Andy Shevchenko
<andy.shevche...@gmail.com> wrote:
> On Thu, Jan 28, 2016 at 4:17 PM, Kees Cook <keesc...@chromium.org> wrote:
>> Some callers of strtobool were passing a pointer to unterminated strings.
>> This fixes the issue and consolidates some logic in cifs.
>
> My comments below.
>
> First of all I don't think currently there is an issue in cifs, since
> strbool checks only first character of the input string, or are you
> talking about something else?

Right, no, this is a fix before extending strtobool to parse the
second character in the string (for handling "on" and "off").

>> diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
>> b/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> index 0b9c580af988..76af60899c69 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
>> @@ -880,13 +880,13 @@ mwifiex_reset_write(struct file *file,
>>  {
>> struct mwifiex_private *priv = file->private_data;
>> struct mwifiex_adapter *adapter = priv->adapter;
>> -   char cmd;
>> +   char cmd[2] = { '\0' };
>> bool result;
>>
>> -   if (copy_from_user(, ubuf, sizeof(cmd)))
>> +   if (copy_from_user(cmd, ubuf, sizeof(char)))
>> return -EFAULT;
>>
>> -   if (strtobool(, ))
>> +   if (strtobool(cmd, ))
>> return -EINVAL;
>
> Can we do strtobool_from_user() instead like kstrto*from_user() and
> similar helpers are done?

Yeah, that might clean this up a bit more. I will add it.

>> if (!result)
>> diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
>> index 50b268483302..2f7ffcc9e364 100644
>> --- a/fs/cifs/cifs_debug.c
>> +++ b/fs/cifs/cifs_debug.c
>> @@ -251,11 +251,29 @@ static const struct file_operations 
>> cifs_debug_data_proc_fops = {
>> .release= single_release,
>>  };
>>
>> +static int get_user_bool(const char __user *buffer, bool *store)
>> +{
>> +   char c[2] = { '\0' };
>> +   bool bv;
>> +   int rc;
>> +
>> +   rc = get_user(c[0], buffer);
>> +   if (rc)
>> +   return rc;
>> +
>> +   rc = strtobool(c, );
>> +   if (rc)
>> +   return rc;
>> +
>> +   *store = bv;
>> +
>> +   return 0;
>> +}
>> +
>>  #ifdef CONFIG_CIFS_STATS
>>  static ssize_t cifs_stats_proc_write(struct file *file,
>> const char __user *buffer, size_t count, loff_t *ppos)
>>  {
>> -   char c;
>> bool bv;
>> int rc;
>> struct list_head *tmp1, *tmp2, *tmp3;
>> @@ -263,34 +281,32 @@ static ssize_t cifs_stats_proc_write(struct file *file,
>> struct cifs_ses *ses;
>> struct cifs_tcon *tcon;
>>
>> -   rc = get_user(c, buffer);
>> +   rc = get_user_bool(buffer, );
>> if (rc)
>> return rc;
>>
>> -   if (strtobool(, ) == 0) {
>>  #ifdef CONFIG_CIFS_STATS2
>
> I would suggest to do a separate patch which just changes a pattern
> and thus indentation without changing anything in functionality.

Okay, noted.

>> -   atomic_set(, 0);
>> -   atomic_set(, 0);
>> +   atomic_set(, 0);
>> +   atomic_set(, 0);
>>  #endif /* CONFIG_CIFS_STATS2 */
>> -   spin_lock(_tcp_ses_lock);
>> -   list_for_each(tmp1, _tcp_ses_list) {
>> -   server = list_entry(tmp1, struct TCP_Server_Info,
>> -   tcp_ses_list);
>> -   list_for_each(tmp2, >smb_ses_list) {
>> -   ses = list_entry(tmp2, struct cifs_ses,
>> -smb_ses_list);
>> -   list_for_each(tmp3, >tcon_list) {
>> -   tcon = list_entry(tmp3,
>> - struct cifs_tcon,
>> - tcon_list);
>> -   atomic_set(>num_smbs_sent, 0);
>> -   if (server->ops->clear_stats)
>> -   
>> server->ops->clear_stats(tcon);
>> -   }
>> +   spin_lock(_tcp_ses_lock);
>> +   list_

[PATCH] lib: fix callers of strtobool to use char array

2016-01-27 Thread Kees Cook
Some callers of strtobool were passing a pointer to unterminated strings.
This fixes the issue and consolidates some logic in cifs.

Signed-off-by: Kees Cook <keesc...@chromium.org>
Cc: Amitkumar Karwar <akar...@marvell.com>
Cc: Nishant Sarmukadam <nisha...@marvell.com>
Cc: Kalle Valo <kv...@codeaurora.org>
Cc: Steve French <sfre...@samba.org>
Cc: linux-c...@vger.kernel.org
---
This is preparation for adding "on"/"off" support to strtobool(), and I
want to make sure the solution isn't upsetting to the two callers. :)
---
 drivers/net/wireless/marvell/mwifiex/debugfs.c |  6 +-
 fs/cifs/cifs_debug.c   | 78 --
 fs/cifs/cifs_debug.h   |  2 +-
 fs/cifs/cifsfs.c   |  6 +-
 fs/cifs/cifsglob.h |  4 +-
 5 files changed, 44 insertions(+), 52 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
b/drivers/net/wireless/marvell/mwifiex/debugfs.c
index 0b9c580af988..76af60899c69 100644
--- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
+++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
@@ -880,13 +880,13 @@ mwifiex_reset_write(struct file *file,
 {
struct mwifiex_private *priv = file->private_data;
struct mwifiex_adapter *adapter = priv->adapter;
-   char cmd;
+   char cmd[2] = { '\0' };
bool result;
 
-   if (copy_from_user(, ubuf, sizeof(cmd)))
+   if (copy_from_user(cmd, ubuf, sizeof(char)))
return -EFAULT;
 
-   if (strtobool(, ))
+   if (strtobool(cmd, ))
return -EINVAL;
 
if (!result)
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index 50b268483302..cafe464fa1b7 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -251,11 +251,29 @@ static const struct file_operations 
cifs_debug_data_proc_fops = {
.release= single_release,
 };
 
+static int get_user_bool(const char __user *buffer, bool *store)
+{
+   char c[2] = { '\0' };
+   bool bv;
+   int rc;
+
+   rc = get_user(c[0], buffer);
+   if (rc)
+   return rc;
+
+   rc = strtobool(c, );
+   if (rc)
+   return rc;
+
+   *store = bv;
+
+   return 0;
+}
+
 #ifdef CONFIG_CIFS_STATS
 static ssize_t cifs_stats_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
bool bv;
int rc;
struct list_head *tmp1, *tmp2, *tmp3;
@@ -263,11 +281,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
struct cifs_ses *ses;
struct cifs_tcon *tcon;
 
-   rc = get_user(c, buffer);
-   if (rc)
-   return rc;
-
-   if (strtobool(, ) == 0) {
+   rc = get_user_bool(buffer, );
+   if (rc == 0) {
 #ifdef CONFIG_CIFS_STATS2
atomic_set(, 0);
atomic_set(, 0);
@@ -290,7 +305,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
}
}
spin_unlock(_tcp_ses_lock);
-   }
+   } else
+   return rc;
 
return count;
 }
@@ -433,17 +449,17 @@ static int cifsFYI_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t cifsFYI_proc_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
 {
-   char c;
+   char c[2] = { '\0' };
bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = get_user(c[0], buffer);
if (rc)
return rc;
-   if (strtobool(, ) == 0)
+   if (strtobool(c, ) == 0)
cifsFYI = bv;
-   else if ((c > '1') && (c <= '9'))
-   cifsFYI = (int) (c - '0'); /* see cifs_debug.h for meanings */
+   else if ((c[0] > '1') && (c[0] <= '9'))
+   cifsFYI = (int) (c[0] - '0'); /* see cifs_debug.h for meanings 
*/
 
return count;
 }
@@ -471,20 +487,12 @@ static int cifs_linux_ext_proc_open(struct inode *inode, 
struct file *file)
 static ssize_t cifs_linux_ext_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = get_user_bool(buffer, );
if (rc)
return rc;
 
-   rc = strtobool(, );
-   if (rc)
-   return rc;
-
-   linuxExtEnabled = bv;
-
return count;
 }
 
@@ -511,20 +519,12 @@ static int cifs_lookup_cache_proc_open(struct inode 
*inode, struct file *file)
 static ssize_t cifs_lookup_cache_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
-   bool bv;
int rc;
 
-   rc = get_user(c, buffer);
+   rc = get_user_bool(buffer, );
if (rc)
return rc;
 
-   rc = st

Re: [PATCH] lib: fix callers of strtobool to use char array

2016-01-27 Thread Kees Cook
On Wed, Jan 27, 2016 at 4:58 PM, Joe Perches <j...@perches.com> wrote:
> On Wed, 2016-01-27 at 16:45 -0800, Kees Cook wrote:
>> Some callers of strtobool were passing a pointer to unterminated strings.
>> This fixes the issue and consolidates some logic in cifs.
>
> This may be incomplete as it duplicates the behavior for
> the old number of characters, but this is not a solution
> for the entry of a bool that is "on" or "off".

As in, the on/off patch is missing? Yes, that's been sent separately,
but I wanted to make sure these changes weren't upsetting to the two
users.

>> diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
> []
>> @@ -290,7 +305,8 @@ static ssize_t cifs_stats_proc_write(struct file *file,
>>   }
>>   }
>>   spin_unlock(_tcp_ses_lock);
>> - }
>> + } else
>> + return rc;
>
> Likely better to reverse the test and unindent the
> preceding block.
>
> Otherwise, please make sure to use the general brace
> form of when one branch needs braces, the other branch
> should have them too.

Okay, sure, I'll rework this and send it together with the on/off patch.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


[PATCH 1/3] lib: fix callers of strtobool to use char array

2016-01-28 Thread Kees Cook
Some callers of strtobool were passing a pointer to unterminated strings.
This fixes the issue and consolidates some logic in cifs.

Signed-off-by: Kees Cook <keesc...@chromium.org>
Cc: Amitkumar Karwar <akar...@marvell.com>
Cc: Nishant Sarmukadam <nisha...@marvell.com>
Cc: Kalle Valo <kv...@codeaurora.org>
Cc: Steve French <sfre...@samba.org>
Cc: linux-c...@vger.kernel.org
---
 drivers/net/wireless/marvell/mwifiex/debugfs.c |   6 +-
 fs/cifs/cifs_debug.c   | 106 -
 fs/cifs/cifs_debug.h   |   2 +-
 fs/cifs/cifsfs.c   |   6 +-
 fs/cifs/cifsglob.h |   4 +-
 5 files changed, 58 insertions(+), 66 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
b/drivers/net/wireless/marvell/mwifiex/debugfs.c
index 0b9c580af988..76af60899c69 100644
--- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
+++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
@@ -880,13 +880,13 @@ mwifiex_reset_write(struct file *file,
 {
struct mwifiex_private *priv = file->private_data;
struct mwifiex_adapter *adapter = priv->adapter;
-   char cmd;
+   char cmd[2] = { '\0' };
bool result;
 
-   if (copy_from_user(, ubuf, sizeof(cmd)))
+   if (copy_from_user(cmd, ubuf, sizeof(char)))
return -EFAULT;
 
-   if (strtobool(, ))
+   if (strtobool(cmd, ))
return -EINVAL;
 
if (!result)
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index 50b268483302..2f7ffcc9e364 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -251,11 +251,29 @@ static const struct file_operations 
cifs_debug_data_proc_fops = {
.release= single_release,
 };
 
+static int get_user_bool(const char __user *buffer, bool *store)
+{
+   char c[2] = { '\0' };
+   bool bv;
+   int rc;
+
+   rc = get_user(c[0], buffer);
+   if (rc)
+   return rc;
+
+   rc = strtobool(c, );
+   if (rc)
+   return rc;
+
+   *store = bv;
+
+   return 0;
+}
+
 #ifdef CONFIG_CIFS_STATS
 static ssize_t cifs_stats_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
bool bv;
int rc;
struct list_head *tmp1, *tmp2, *tmp3;
@@ -263,34 +281,32 @@ static ssize_t cifs_stats_proc_write(struct file *file,
struct cifs_ses *ses;
struct cifs_tcon *tcon;
 
-   rc = get_user(c, buffer);
+   rc = get_user_bool(buffer, );
if (rc)
return rc;
 
-   if (strtobool(, ) == 0) {
 #ifdef CONFIG_CIFS_STATS2
-   atomic_set(, 0);
-   atomic_set(, 0);
+   atomic_set(, 0);
+   atomic_set(, 0);
 #endif /* CONFIG_CIFS_STATS2 */
-   spin_lock(_tcp_ses_lock);
-   list_for_each(tmp1, _tcp_ses_list) {
-   server = list_entry(tmp1, struct TCP_Server_Info,
-   tcp_ses_list);
-   list_for_each(tmp2, >smb_ses_list) {
-   ses = list_entry(tmp2, struct cifs_ses,
-smb_ses_list);
-   list_for_each(tmp3, >tcon_list) {
-   tcon = list_entry(tmp3,
- struct cifs_tcon,
- tcon_list);
-   atomic_set(>num_smbs_sent, 0);
-   if (server->ops->clear_stats)
-   server->ops->clear_stats(tcon);
-   }
+   spin_lock(_tcp_ses_lock);
+   list_for_each(tmp1, _tcp_ses_list) {
+   server = list_entry(tmp1, struct TCP_Server_Info,
+   tcp_ses_list);
+   list_for_each(tmp2, >smb_ses_list) {
+   ses = list_entry(tmp2, struct cifs_ses,
+smb_ses_list);
+   list_for_each(tmp3, >tcon_list) {
+   tcon = list_entry(tmp3,
+ struct cifs_tcon,
+ tcon_list);
+   atomic_set(>num_smbs_sent, 0);
+   if (server->ops->clear_stats)
+   server->ops->clear_stats(tcon);
}
}
-   spin_unlock(_tcp_ses_lock);
}
+   spin_unlock(_tcp_ses_lock);
 
return count;
 }
@@ -433,17 +449,17 @@ static int cifsFYI_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t cifsFYI_proc_write(struct fil

[PATCH 2/3] lib: add "on" and "off" to strtobool

2016-01-28 Thread Kees Cook
Several places in the kernel expect to use "on" and "off" for their
boolean signifiers, so add them to strtobool.

Signed-off-by: Kees Cook <keesc...@chromium.org>
Cc: Rasmus Villemoes <li...@rasmusvillemoes.dk>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 lib/string.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index 0323c0d5629a..091570708db7 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -635,12 +635,15 @@ EXPORT_SYMBOL(sysfs_streq);
  * @s: input string
  * @res: result
  *
- * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
- * Otherwise it will return -EINVAL.  Value pointed to by res is
- * updated upon finding a match.
+ * This routine returns 0 iff the first character is one of 'Yy1Nn0', or
+ * [oO][NnFf] for "on" and "off". Otherwise it will return -EINVAL.  Value
+ * pointed to by res is updated upon finding a match.
  */
 int strtobool(const char *s, bool *res)
 {
+   if (!s)
+   return -EINVAL;
+
switch (s[0]) {
case 'y':
case 'Y':
@@ -652,6 +655,21 @@ int strtobool(const char *s, bool *res)
case '0':
*res = false;
break;
+   case 'o':
+   case 'O':
+   switch (s[1]) {
+   case 'n':
+   case 'N':
+   *res = true;
+   break;
+   case 'f':
+   case 'F':
+   *res = false;
+   break;
+   default:
+   return -EINVAL;
+   }
+   break;
default:
return -EINVAL;
}
-- 
2.6.3



[PATCH 0/3] lib: add "on" and "off" to strtobool

2016-01-28 Thread Kees Cook
This consolidates logic for handling "on"/"off" parsing for bools into
the existing strtobool function. This requires making sure callers are
passing NULL-terminated strings.

-Kees



[PATCH 3/3] param: convert some "on"/"off" users to strtobool

2016-01-28 Thread Kees Cook
This changes several users of manual "on"/"off" parsing to use strtobool.

Signed-off-by: Kees Cook <keesc...@chromium.org>
Cc: x...@kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
---
 arch/powerpc/kernel/rtasd.c  | 10 +++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 11 +++
 arch/s390/kernel/time.c  |  8 ++--
 arch/s390/kernel/topology.c  |  8 +++-
 arch/x86/kernel/aperture_64.c| 13 +++--
 include/linux/tick.h |  2 +-
 kernel/time/hrtimer.c| 11 +++
 kernel/time/tick-sched.c | 11 +++
 8 files changed, 21 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 5a2c049c1c61..984e67e91ba3 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -49,7 +50,7 @@ static unsigned int rtas_error_log_buffer_max;
 static unsigned int event_scan;
 static unsigned int rtas_event_scan_rate;
 
-static int full_rtas_msgs = 0;
+static bool full_rtas_msgs;
 
 /* Stop logging to nvram after first fatal error */
 static int logging_enabled; /* Until we initialize everything,
@@ -592,11 +593,6 @@ __setup("surveillance=", surveillance_setup);
 
 static int __init rtasmsgs_setup(char *str)
 {
-   if (strcmp(str, "on") == 0)
-   full_rtas_msgs = 1;
-   else if (strcmp(str, "off") == 0)
-   full_rtas_msgs = 0;
-
-   return 1;
+   return strtobool(str, _rtas_msgs);
 }
 __setup("rtasmsgs=", rtasmsgs_setup);
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 32274f72fe3f..bb333e9fd77a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -47,20 +48,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = 
CPU_STATE_OFFLINE;
 
 static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
 
-static int cede_offline_enabled __read_mostly = 1;
+static bool cede_offline_enabled __read_mostly = true;
 
 /*
  * Enable/disable cede_offline when available.
  */
 static int __init setup_cede_offline(char *str)
 {
-   if (!strcmp(str, "off"))
-   cede_offline_enabled = 0;
-   else if (!strcmp(str, "on"))
-   cede_offline_enabled = 1;
-   else
-   return 0;
-   return 1;
+   return strtobool(str, _offline_enabled);
 }
 
 __setup("cede_offline=", setup_cede_offline);
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 99f84ac31307..afc7fc9684ba 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -1433,7 +1433,7 @@ device_initcall(etr_init_sysfs);
 /*
  * Server Time Protocol (STP) code.
  */
-static int stp_online;
+static bool stp_online;
 static struct stp_sstpi stp_info;
 static void *stp_page;
 
@@ -1444,11 +1444,7 @@ static struct timer_list stp_timer;
 
 static int __init early_parse_stp(char *p)
 {
-   if (strncmp(p, "off", 3) == 0)
-   stp_online = 0;
-   else if (strncmp(p, "on", 2) == 0)
-   stp_online = 1;
-   return 0;
+   return strtobool(p, _online);
 }
 early_param("stp", early_parse_stp);
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 40b8102fdadb..10e388216307 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -37,7 +38,7 @@ static void set_topology_timer(void);
 static void topology_work_fn(struct work_struct *work);
 static struct sysinfo_15_1_x *tl_info;
 
-static int topology_enabled = 1;
+static bool topology_enabled = true;
 static DECLARE_WORK(topology_work, topology_work_fn);
 
 /*
@@ -444,10 +445,7 @@ static const struct cpumask *cpu_book_mask(int cpu)
 
 static int __init early_parse_topology(char *p)
 {
-   if (strncmp(p, "off", 3))
-   return 0;
-   topology_enabled = 0;
-   return 0;
+   return strtobool(p, _enabled);
 }
 early_param("topology", early_parse_topology);
 
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 6e85f713641d..6608b00a516a 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -227,19 +228,11 @@ static u32 __init search_agp_bridge(u32 *order, int 
*valid_agp)
return 0;
 }
 
-static int gart_fix_e820 __initdata = 1;
+static bool gart_fix_e820 __i

Re: [PATCH v2 00/10] userns: sysctl limits for namespaces

2016-07-26 Thread Kees Cook
On Tue, Jul 26, 2016 at 10:29 AM, Michael Kerrisk (man-pages)
<mtk.manpa...@gmail.com> wrote:
> On 26 July 2016 at 18:52, Kees Cook <keesc...@chromium.org> wrote:
>> On Tue, Jul 26, 2016 at 8:06 AM, Eric W. Biederman
>> <ebied...@xmission.com> wrote:
>>> "Michael Kerrisk (man-pages)" <mtk.manpa...@gmail.com> writes:
>>>
>>>> Hello Eric,
>>>>
>>>> I realized I had a question after the last mail.
>>>>
>>>> On 07/21/2016 06:39 PM, Eric W. Biederman wrote:
>>>>>
>>>>> This patchset addresses two use cases:
>>>>> - Implement a sane upper bound on the number of namespaces.
>>>>> - Provide a way for sandboxes to limit the attack surface from
>>>>>   namespaces.
>>>>
>>>> Can you say more about the second point? What exactly is the
>>>> problem that is being addressed, and how does the patch series
>>>> address it? (It would be good to have those details in the
>>>> revised commit message...)
>>>
>>> At some point it was reported that seccomp was not sufficient to disable
>>> namespace creation.  I need to go back and look at that claim to see
>>> which set of circumstances that was referring to.  Seccomp doesn't stack
>>> so I can see why it is an issue.
>>
>> seccomp does stack. The trouble usually comes from a perception that
>> seccomp overhead is not trivial, so setting a system-wide policy is a
>> bit of a large hammer for such a limitiation. Also, at the time,
>> seccomp could be bypasses with ptrace, but this (as of v4.8) is no
>> longer true.
>
> Sounds like someone needs to send me a patch for the seccomp.2 man page?

It's on my TODO list, no worries. :) I'm waiting for it to land in
Linus's tree first. It's only been in -next so far.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH v2 00/10] userns: sysctl limits for namespaces

2016-07-22 Thread Kees Cook
On Fri, Jul 22, 2016 at 11:45 AM, Eric W. Biederman
<ebied...@xmission.com> wrote:
> Colin Walters <walt...@verbum.org> writes:
>
>> On Thu, Jul 21, 2016, at 12:39 PM, Eric W. Biederman wrote:
>>>
>>> This patchset addresses two use cases:
>>> - Implement a sane upper bound on the number of namespaces.
>>> - Provide a way for sandboxes to limit the attack surface from
>>>   namespaces.
>>
>> Perhaps this is obvious, but since you didn't quite explicitly state it;
>> do you see this as obsoleting the existing downstream patches
>> mentioned in:
>> https://lwn.net/Articles/673597/
>> It seems conceptually similar to Kees' original approach, right?
>
> Similar yes, and I expect it fills the need.  My primary difference is
> that I believe this approach makes sense from a perspective of assuming
> that user namespaces or other namespaces are not any buggier than any
> other piece of kernel code and that people will use them.
>
> I don't see these limits making sense from a perspective that user
> namespaces are flawed and distro kernels should not have enabled them in
> the first place.  That was my perception right or wrong of Kees patches
> and the related patches that landed in Ubuntu and Debian.
>
> With Kees approach I could not see how to handle the case where some
> applications on the system wanted user namespaces and others don't.
> Which made it very nasty for future evolution and more deployment of
> user namespaces.  Being per user namespace these limits can be used to
> sandbox applications without affecting the rest of the system.

While it certainly works for my use-case (init ns
max_usernamespaces=0), I don't see how this helps the case of "let
user foobar open 1 userns, but everyone else is 0", which is likely
the middle ground between "just turn it off" and "everyone gets to
create usernamespaces". I'm personally not interested in that level of
granularity, but in earlier discussions it sounded like this was
something you wanted?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH v2 00/10] userns: sysctl limits for namespaces

2016-07-26 Thread Kees Cook
On Tue, Jul 26, 2016 at 8:06 AM, Eric W. Biederman
<ebied...@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpa...@gmail.com> writes:
>
>> Hello Eric,
>>
>> I realized I had a question after the last mail.
>>
>> On 07/21/2016 06:39 PM, Eric W. Biederman wrote:
>>>
>>> This patchset addresses two use cases:
>>> - Implement a sane upper bound on the number of namespaces.
>>> - Provide a way for sandboxes to limit the attack surface from
>>>   namespaces.
>>
>> Can you say more about the second point? What exactly is the
>> problem that is being addressed, and how does the patch series
>> address it? (It would be good to have those details in the
>> revised commit message...)
>
> At some point it was reported that seccomp was not sufficient to disable
> namespace creation.  I need to go back and look at that claim to see
> which set of circumstances that was referring to.  Seccomp doesn't stack
> so I can see why it is an issue.

seccomp does stack. The trouble usually comes from a perception that
seccomp overhead is not trivial, so setting a system-wide policy is a
bit of a large hammer for such a limitiation. Also, at the time,
seccomp could be bypasses with ptrace, but this (as of v4.8) is no
longer true.

> The general problem is that namespaces by their nature (and especially
> in combination with the user namespaces) allow unprivileged users to use
> more of the kernel than a user would have access to without them.  This
> in turn allows malicious users more kernel calls they can use in attempt
> to find an exploitable bug.
>
> So if you are building a sandbox/chroot jail/chromium tab or anything
> like that and you know you won't be needing a kernel feature having an
> easy way to disable the feature is useful for making the kernel
> marginally more secure, as certain attack vectors are no longer
> possible.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM

2016-08-08 Thread Kees Cook
ective, programmatic examination of kernel
structures means you can trivially leak kernel memory locations and
contents. Resisting these sorts of leaks needs to be addressed too.

This looks like a subset of kprobes but available to non-root users,
which looks rather scary to me at first glance. :)

-Kees

>
> I would love to know what y'all think.
>
> Sargun Dhillon (4):
>   bpf: move tracing helpers to shared helpers
>   bpf, security: Add Checmate
>   security/checmate: Add Checmate sample
>   bpf: Restrict Checmate bpf programs to current kernel ABI
>
>  include/linux/bpf.h  |   2 +
>  include/linux/checmate.h |  38 +
>  include/uapi/linux/Kbuild|   1 +
>  include/uapi/linux/bpf.h |   1 +
>  include/uapi/linux/checmate.h|  65 +
>  include/uapi/linux/prctl.h   |   3 +
>  kernel/bpf/helpers.c |  34 +
>  kernel/bpf/syscall.c |   2 +-
>  kernel/trace/bpf_trace.c |  33 -
>  samples/bpf/Makefile |   4 +
>  samples/bpf/bpf_load.c   |  11 +-
>  samples/bpf/checmate1_kern.c |  28 
>  samples/bpf/checmate1_user.c |  54 +++
>  security/Kconfig |   1 +
>  security/Makefile|   2 +
>  security/checmate/Kconfig|   6 +
>  security/checmate/Makefile   |   3 +
>  security/checmate/checmate_bpf.c |  67 +
>  security/checmate/checmate_lsm.c | 304 
> +++
>  19 files changed, 622 insertions(+), 37 deletions(-)
>  create mode 100644 include/linux/checmate.h
>  create mode 100644 include/uapi/linux/checmate.h
>  create mode 100644 samples/bpf/checmate1_kern.c
>  create mode 100644 samples/bpf/checmate1_user.c
>  create mode 100644 security/checmate/Kconfig
>  create mode 100644 security/checmate/Makefile
>  create mode 100644 security/checmate/checmate_bpf.c
>  create mode 100644 security/checmate/checmate_lsm.c
>
> --
> 2.7.4
>



-- 
Kees Cook
Nexus Security


Re: [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM

2016-08-08 Thread Kees Cook
On Mon, Aug 8, 2016 at 5:00 PM, Sargun Dhillon <sar...@sargun.me> wrote:
> On Mon, Aug 08, 2016 at 04:44:02PM -0700, Kees Cook wrote:
>> On Thu, Aug 4, 2016 at 12:11 AM, Sargun Dhillon <sar...@sargun.me> wrote:
>> > I distributed this patchset to linux-security-mod...@vger.kernel.org 
>> > earlier,
>> > but based on the fact that the archive is down, and this is a fairly
>> > broad-sweeping proposal, I figured I'd grow the audience a little bit. 
>> > Sorry
>> > if you received this multiple times.
>> >
>> > I've begun building out the skeleton of a Linux Security Module, and I'd 
>> > like to
>> > get feedback on it. It's a skeleton, and I've only populated a few hooks, 
>> > so I'm
>> > mostly looking for input on the general proposal, interest, and design. 
>> > It's a
>> > minor LSM. My particular use case is one in which containers are being
>> > dynamically deployed to machines by internal developers in a different 
>> > group.
>> > The point of Checmate is to act as an extensible bed for _safe_, complex
>> > security policies. It's nice to enable dynamic security policies that can 
>> > be
>> > defined in C, and change as neccessary, without ever having to patch, or 
>> > rebuild
>> > the kernel.
>> >
>> > For many of these containers, the security policies can be fairly nuanced. 
>> > One
>> > particular one to take into account is network security. Often times,
>> > administrators want to prevent ingress, and egress connectivity except 
>> > from a
>> > few select IPs. Egress filtering can be managed using net_cls, but without
>> > modifying running software, it's non-trivial to attach a filter to all 
>> > sockets
>> > being created within a container. The inet_conn_request, socket_recvmsg,
>> > socket_sock_rcv_skb hooks make this trivial to implement.
>> >
>> > Other times, containers need to be throttled in places where there's not 
>> > really
>> > a good place to impose that policy for software which isn't built 
>> > in-house.  If
>> > one wants to limit file creations/sec, or reject I/O under certain
>> > characteristics, there's not a great place to do it now. This gives 
>> > engineers a
>> > mechanism to write those policies.
>> >
>> > This same flexibility can be used to take existing programs and enable 
>> > safe BPF
>> > helpers to modify memory to allow rules to pass. One example that I 
>> > prototyped
>> > was Docker's port mapping, which has an overhead (DNAT), and there's some 
>> > loss
>> > of fidelity in the BSD Socket API to identify what's going on. Instead, we 
>> > can
>> > just rewrite the port in a bind, based upon some data in a BPF map, and a 
>> > cgroup
>> > match.
>> >
>> > I can actually see other minor security modules being implemented in 
>> > Checmate,
>> > for example, Yama, or the recently proposed Hardchroot could be 
>> > reimplemented in
>> > BPF. Potentially, they could even be API compatible.
>> >
>> > Although, at first, much of this sounds like seccomp, it's quite 
>> > different. For
>> > one, what we can do in the security hooks is more complex (access to kernel
>> > pointers). The other side of this is we can have effects on a system-wide,
>> > or cgroup level. This also circumvents the need for CRIU-friendly policies.
>> >
>> > Lastly, the flexibility of this mechanism allows for prevention of security
>> > vulnerabilities which are often complex in nature and require the 
>> > interaction
>> > of multiple hooks (CVE-2014-9717 is a good example), and although ksplice,
>> > and livepatch exist, they're not always easy to use, as compared to loading
>> > a single bpf program across all kernels.
>> >
>> > The user-facing API is exposed via prctl as it's meant to be very simple 
>> > (at
>> > least the kernel components). It only has three operations. For a given 
>> > security
>> > hook, you can attach a BPF program to it, which will add it to the set of
>> > programs that are executed over when the hook is hit. You can reset a hook,
>> > which removes all program associated with a given hook, and you can set a
>> > deny_reset flag on a hook to prevent anyone from resetting it. It's likely 
>> > that
>> > an individual would want to set this in any production use case.
>>
>&g

Re: arch: arm: bpf: Converting cBPF to eBPF for arm 32 bit

2017-02-08 Thread Kees Cook
On Wed, Feb 1, 2017 at 5:01 AM, Shubham Bansal
<illusionist@gmail.com> wrote:
> Hi Kees & Daniel,
>
> On Tue, Jan 31, 2017 at 09:44:56AM -0800, Kees Cook wrote:
>> >> > 1.) Currently, as eBPF uses 64 bit registers, I am mapping 64 bit eBPF
>> >> > registers with 32 bit arm registers which looks wrong to me. Do anybody
>> >> > have some idea about how to map eBPF->arm 32 bit registers ?
>> >>
>> >> I was going to say "look at the x86 32-bit implementation." ... But
>> >> there isn't one. :( I'm going to guess that there isn't a very good
>> >> answer here. I assume you'll have to build some kind of stack scratch
>> >> space to load/save.
>> >
>> >
>> > Now I see why nobody has implemented eBPF JIT for the 32 bit systems. I
>> > think its very difficult to implement it without any complications and
>> > errors.
>>
>> Yeah, that does seem to make it much more difficult.
> I was thinking of first implementing only instructions with 32 bit
> register operands. It will hugely decrease the surface area of eBPF
> instructions that I have to cover for the first patch.

I don't know much about eBPF internals, but I can take a crack at
answering this... I assume whatever you implement would need to pass
the BPF regression tests...

> So, What I am thinking is something like this :
>
> - bpf_mov r0(64),r1(64) will be JITed like this :
> - ar1(32) <- r1(64). Convert/Mask 64 bit ebpf register(r1) value into 32
> bit and store it in arm register(ar1).
> - Do MOV ar0(32),ar1(32) as an ARM instruction.
> - ar0(32) -> r0(64). Zero Extend the ar0 32 bit register value
> and store it in 64 bit ebpf register r0.

It seems like you're suggesting truncating the 64-bit register values?
I think your best solution is going to be to use a memory scratch
space and build 64-bit operations using 32-bit registers and memory
operations.

> - Similarly, For all BPF_ALU class instructions.
> - For BPF_ADD, I will mask the addition result to 32 bit only.
>  I am not sure, Overflow might be a problem.
> - For BPF_SUB, I will mask the subtraction result to 32 bit only.
>  I am not sure, Underflow might be problem.
> - For BPF_MUL, similar to BPF_ADD. Overflow Problem ?
> - For BPF_DIV, 32 bit masking should be fine, I guess.
> - For BPF_OR, BPF_AND, BPF_XOR, BPF_LSH, BPF_RSH, BPF_MOD 32 bit
>  masking should be fine.
> - For BPF_NEG and BPF_ARSH, might be a problem because of the sign bit.
> - For BPF_END, 32 bit masking should work fine.
>  Let me know if any of the above point is wrong or need your suggestion.
>
> - Although, for ALU instructions, there is a big problem of register
>   flag manipulations. Generally, architecture's ABI takes care of this
>   part but as we are doing 64 bit Instructions emulation(kind of) on 32
>   bit machine, it needs to be done manually. Does that sound correct ?

You can't truncate, but you'll have to build 64-bit ops using 32-bit registers.

>
> - I am not JITing BPF_ALU64 class instructions as of now. As we have to
>   take care of atomic instructions and race conditions with these
>   instruction which looks complicated to me as of now. Will try to figure out
>   this part and implement it later. Currently, I will just let it be
>   interpreted by the ebpf interpreter.
>
> - For BPF_JMP class, I am assuming that, although eBPF is 64 bit ABI,
>   the address pointers on 32 bit arch like arm will be of 32 bit only.
>   So, for BPF_JMP, masking the 64 bit destination address to 32 bit
>   should do the trick and no address will be corrupted in this way. Am I
>   correct to assume this ?
>   Also, I need to check for address getting out of the allowed memory
>   range.

That's probably true, but the JIT should likely detect a truncation
here, if you're going to depend on it, and reject the BPF.

> - For BPF_LD, BPF_LDX, BPF_ST and BPF_STX class instructions, I am
>   assuming the same thing as above - All addresses and pointers are 32
>   bit - which can be taken care just by maksing the eBPF register
>   values. Does that sound correct ?
>   Also, I need to check for the address overflow, address getting out
>   of the allowed memory range and things like that.

I'd say, get something working and send a patch -- that's likely the
best way to get more detailed feedback. :)

-Kees

-- 
Kees Cook
Pixel Security


Re: arch: arm: bpf: Converting cBPF to eBPF for arm 32 bit

2017-01-30 Thread Kees Cook
On Mon, Jan 30, 2017 at 2:38 AM, Shubham Bansal
<illusionist@gmail.com> wrote:
> Hi all,
>
> Please ignore last copy of this mail. Kernel mailing lists bounced my
> last mail back because of HTML content.
>
> Just starting a new thread with proper heading on the main kernel
> hardening and net-dev mailing list so that other people can be involved
> in this. Please don't take this as a personal mail.
>
> I am working on conversion of arm32 cBPF into eBPF JIT. I wanted some
> help, regarding understanding of kernel code, from the dev available on
> the mailing list. If you look at the ./arch/arm/net/bpf_jit_32.c code,
> you will see jit_ctx structure. If anybody could help me understand what
> each fields of this structure represent then it would be great.
>
> Also, currently I am mapping the eBPF registers to arm 32 bit registers
> in the following way.
>
>> static const int bpf2a32[] = {
>>
>> /* return value from in-kernel function, and exit value from
>> eBPF
>> */
>> [BPF_REG_0] = ARM_R0,
>>
>> /* arguments from eBPF program to in-kernel function */
>>
>> [BPF_REG_1] = ARM_R1,
>>
>> [BPF_REG_2] = ARM_R2,
>>
>> [BPF_REG_3] = ARM_R3,
>>
>> [BPF_REG_4] = ARM_R4,
>>
>> [BPF_REG_5] = ARM_R5,
>>
>> /* callee saved registers that in-kernel function will
>> preserve */
>>
>> [BPF_REG_6] = ARM_R6,
>>
>> [BPF_REG_7] = ARM_R7,
>>
>> [BPF_REG_8] = ARM_R8,
>>
>> [BPF_REG_9] = ARM_R9,
>>
>> /* Read only Frame Pointer to access Stack */
>>
>> [BPF_REG_FP] = ARM_FP,
>>
>> /* Temperory Register for internal BPF JIT */
>>
>> [TMP_REG_1] = ARM_R11,
>>
>> /* temporary register for blinding constants */
>>
>> [BPF_REG_AX] = ARM_R10,
>>
>> };
>
> But I have some question if anybody could help with those.
>
> 1.) Currently, as eBPF uses 64 bit registers, I am mapping 64 bit eBPF
> registers with 32 bit arm registers which looks wrong to me. Do anybody
> have some idea about how to map eBPF->arm 32 bit registers ?

I was going to say "look at the x86 32-bit implementation." ... But
there isn't one. :( I'm going to guess that there isn't a very good
answer here. I assume you'll have to build some kind of stack scratch
space to load/save.

> 2.) Also, is my current mapping good enough to make the JIT fast enough ?
> because as you might know, eBPF JIT mostly depends on 1-to-1 mapping of
> its instructions with native instructions.

I don't know -- it might be tricky with needing to deal with 64-bit
registers. But if you can make it faster than the non-JIT, it should
be a win. :) Yay assembly.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH 5/9] treewide: use kv[mz]alloc* rather than opencoded variants

2017-01-30 Thread Kees Cook
On Mon, Jan 30, 2017 at 1:49 AM, Michal Hocko <mho...@kernel.org> wrote:
> From: Michal Hocko <mho...@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests <= 32kB (with 4kB pages) are basically never failing and invoke
> OOM killer to satisfy the allocation. This sounds too disruptive for
> something that has a reasonable fallback - the vmalloc. On the other
> hand those requests might fallback to vmalloc even when the memory
> allocator would succeed after several more reclaim/compaction attempts
> previously. There is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Changes since v1
> - add kvmalloc_array - this might silently fix some overflow issues
>   because most users simply didn't check the overflow for the vmalloc
>   fallback.

Awesome, thanks for adding that API. :)

Acked-by: Kees Cook <keesc...@chromium.org>

-Kees

-- 
Kees Cook
Nexus Security


Re: sctp: kernel memory overwrite attempt detected in sctp_getsockopt_assoc_stats

2017-01-17 Thread Kees Cook
On Mon, Jan 16, 2017 at 6:56 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
> On Mon, Jan 16, 2017 at 3:50 PM, David Laight <david.lai...@aculab.com> wrote:
>> From: Dmitry Vyukov
>>> Sent: 16 January 2017 14:04
>>> >> >> I've enabled CONFIG_HARDENED_USERCOPY_PAGESPAN on syzkaller fuzzer and
>> ...
>>> >> The code also takes into account compound pages. As far as I
>>> >> understand the intention of the check is to effectively find
>>> >> out-of-bounds copies (e.g. goes beyond the current heap allocation). I
>>> >> would expect that stacks are allocated as compound pages and don't
>>> >> trigger this check. I don't see it is firing in other similar places.
>>> >>
>>> > Honestly, I'm not overly familiar with stack page allocation, at least 
>>> > not so
>>> > far as compound vs. single page allocation is concerned.  I suppose the 
>>> > question
>>> > your really asking here is: Have you found a case in which the syscall 
>>> > fuzzer
>>> > has forced the allocation of an insecure non-compound page on the stack, 
>>> > or is
>>> > this a false positive warning.  I can't provide the answer to that.
>>>
>>> Yes. I added Kees, author of CONFIG_HARDENED_USERCOPY_PAGESPAN, to To line.
>>> Kees, is this a false positive?
>>
>> I'd guess that the kernel stack is (somehow) allocated page by page
>> rather than by a single multi-page allocate.
>> Or maybe vmalloc() isn't setting the required flag??
>
>
> Just in case, I don't have CONFIG_VMAP_STACK selected.
> If it is a generic issue, then CONFIG_HARDENED_USERCOPY_PAGESPAN looks
> considerably broken as there are tons of copies onto stack. I don't
> see what's special in this particular case.

There have been so many false positives on this option, even though it
is known not to be quite right, that I'll probably just remove it
entirely. It clearly needs much more work before it'll be useful, so
there's no reason to leave it in the kernel to confuse people. :)

-Kees

-- 
Kees Cook
Nexus Security


Re: [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup

2016-10-05 Thread Kees Cook
On Wed, Oct 5, 2016 at 1:58 PM, Mickaël Salaün <m...@digikod.net> wrote:
>
>
> On 04/10/2016 01:43, Kees Cook wrote:
>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <m...@digikod.net> wrote:
>>> This allows to add new eBPF programs to Landlock hooks dedicated to a
>>> cgroup thanks to the BPF_PROG_ATTACH command. Like for socket eBPF
>>> programs, the Landlock hooks attached to a cgroup are propagated to the
>>> nested cgroups. However, when a new Landlock program is attached to one
>>> of this nested cgroup, this cgroup hierarchy fork the Landlock hooks.
>>> This design is simple and match the current CONFIG_BPF_CGROUP
>>> inheritance. The difference lie in the fact that Landlock programs can
>>> only be stacked but not removed. This match the append-only seccomp
>>> behavior. Userland is free to handle Landlock hooks attached to a cgroup
>>> in more complicated ways (e.g. continuous inheritance), but care should
>>> be taken to properly handle error cases (e.g. memory allocation errors).
>>>
>>> Changes since v2:
>>> * new design based on BPF_PROG_ATTACH (suggested by Alexei Starovoitov)
>>>
>>> Signed-off-by: Mickaël Salaün <m...@digikod.net>
>>> Cc: Alexei Starovoitov <a...@kernel.org>
>>> Cc: Andy Lutomirski <l...@amacapital.net>
>>> Cc: Daniel Borkmann <dan...@iogearbox.net>
>>> Cc: Daniel Mack <dan...@zonque.org>
>>> Cc: David S. Miller <da...@davemloft.net>
>>> Cc: Kees Cook <keesc...@chromium.org>
>>> Cc: Tejun Heo <t...@kernel.org>
>>> Link: 
>>> https://lkml.kernel.org/r/20160826021432.ga8...@ast-mbp.thefacebook.com
>>> Link: 
>>> https://lkml.kernel.org/r/20160827204307.ga43...@ast-mbp.thefacebook.com
>>> ---
>>>  include/linux/bpf-cgroup.h  |  7 +++
>>>  include/linux/cgroup-defs.h |  2 ++
>>>  include/linux/landlock.h|  9 +
>>>  include/uapi/linux/bpf.h|  1 +
>>>  kernel/bpf/cgroup.c | 33 ++---
>>>  kernel/bpf/syscall.c| 11 +++
>>>  security/landlock/lsm.c | 40 +++-
>>>  security/landlock/manager.c | 32 
>>>  8 files changed, 131 insertions(+), 4 deletions(-)
>>>
>>> [...]
>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>> index 7b75fa692617..1c18fe46958a 100644
>>> --- a/kernel/bpf/cgroup.c
>>> +++ b/kernel/bpf/cgroup.c
>>> @@ -15,6 +15,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
>>>  EXPORT_SYMBOL(cgroup_bpf_enabled_key);
>>> @@ -31,7 +32,15 @@ void cgroup_bpf_put(struct cgroup *cgrp)
>>> union bpf_object pinned = cgrp->bpf.pinned[type];
>>>
>>> if (pinned.prog) {
>>> -   bpf_prog_put(pinned.prog);
>>> +   switch (type) {
>>> +   case BPF_CGROUP_LANDLOCK:
>>> +#ifdef CONFIG_SECURITY_LANDLOCK
>>> +   put_landlock_hooks(pinned.hooks);
>>> +   break;
>>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>>> +   default:
>>> +   bpf_prog_put(pinned.prog);
>>> +   }
>>> static_branch_dec(_bpf_enabled_key);
>>> }
>>> }
>>
>> I get creeped out by type-controlled unions of pointers. :P I don't
>> have a suggestion to improve this, but I don't like seeing a pointer
>> type managed separately from the pointer itself as it tends to bypass
>> a lot of both static and dynamic checking. A union is better than a
>> cast of void *, but it still worries me. :)
>
> This is not fully satisfactory for me neither but the other approach is
> to use two distinct struct fields instead of a union.
> Do you prefer if there is a "type" field in the "pinned" struct to
> select the union?

Since memory usage isn't a huge deal for this, I'd actually prefer
there just be no union at all. Have a type field, and a distinct
pointer field for each type you're expecting to use. That way there
can never be confusion between types and you could even validate that
only a single field type has been populated, etc.

-Kees

-- 
Kees Cook
Nexus Security


Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing

2016-10-03 Thread Kees Cook
On Tue, Sep 20, 2016 at 10:08 AM, Mickaël Salaün <m...@digikod.net> wrote:
>
> On 15/09/2016 11:19, Pavel Machek wrote:
>> Hi!
>>
>>> This series is a proof of concept to fill some missing part of seccomp as 
>>> the
>>> ability to check syscall argument pointers or creating more dynamic security
>>> policies. The goal of this new stackable Linux Security Module (LSM) called
>>> Landlock is to allow any process, including unprivileged ones, to create
>>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>>> bugs or unexpected/malicious behaviors in userland applications.
>>>
>>> The first RFC [1] was focused on extending seccomp while staying at the 
>>> syscall
>>> level. This brought a working PoC but with some (mitigated) ToCToU race
>>> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
>>> syscall argument evaluation (hence the LSM hooks).
>>
>> Long and nice description follows. Should it go to Documentation/
>> somewhere?
>>
>> Because some documentation would be useful...
>>   Pavel
>
> Right, but I was looking for feedback before investing in documentation. :)

Heh, understood. There are a number of grammar issues that slow me
down when reading this, so when it does move into Documentation/, I'll
have some English nit-picks. :)

While reading I found myself wanting an explicit list of "guiding
principles" for anyone implementing new hooks. It is touched on in
several places (don't expose things, don't allow for privilege
changes, etc). Having that spelled out somewhere would be nice.

-Kees

-- 
Kees Cook
Nexus Security


Re: [RFC v3 07/22] landlock: Handle file comparisons

2016-10-03 Thread Kees Cook
On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <m...@digikod.net> wrote:
> Add eBPF functions to compare file system access with a Landlock file
> system handle:
> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>   This function allows to compare the dentry, inode, device or mount
>   point of the currently accessed file, with a reference handle.
> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>   This function allows an eBPF program to check if the current accessed
>   file is the same or in the hierarchy of a reference handle.
>
> The goal of file system handle is to abstract kernel objects such as a
> struct file or a struct inode. Userland can create this kind of handle
> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
> landlock_handle containing the handle type (e.g.
> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
> also be any descriptions able to match a struct file or a struct inode
> (e.g. path or glob string).
>
> Changes since v2:
> * add MNT_INTERNAL check to only add file handle from user-visible FS
>   (e.g. no anonymous inode)
> * replace struct file* with struct path* in map_landlock_handle
> * add BPF protos
> * fix bpf_landlock_cmp_fs_prop_with_struct_file()
>
> Signed-off-by: Mickaël Salaün <m...@digikod.net>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: Andy Lutomirski <l...@amacapital.net>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: David S. Miller <da...@davemloft.net>
> Cc: James Morris <james.l.mor...@oracle.com>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Serge E. Hallyn <se...@hallyn.com>
> Link: 
> https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com
> ---
>  include/linux/bpf.h|  10 +++
>  include/uapi/linux/bpf.h   |  49 +++
>  kernel/bpf/arraymap.c  |  21 +
>  kernel/bpf/verifier.c  |   8 ++
>  security/landlock/Makefile |   2 +-
>  security/landlock/checker_fs.c | 179 
> +
>  security/landlock/checker_fs.h |  20 +
>  security/landlock/lsm.c|   6 ++
>  8 files changed, 294 insertions(+), 1 deletion(-)
>  create mode 100644 security/landlock/checker_fs.c
>  create mode 100644 security/landlock/checker_fs.h
> [...]
> diff --git a/security/landlock/checker_fs.c b/security/landlock/checker_fs.c
> new file mode 100644
> index ..39eb85dc7d18
> --- /dev/null
> +++ b/security/landlock/checker_fs.c
> @@ -0,0 +1,179 @@
> +/*
> + * Landlock LSM - File System Checkers
> + *
> + * Copyright (C) 2016  Mickaël Salaün <m...@digikod.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + */
> +
> +#include  /* enum bpf_map_array_op */
> +#include 
> +#include  /* path_is_under() */
> +#include  /* struct path */
> +
> +#include "checker_fs.h"
> +
> +#define EQUAL_NOT_NULL(a, b) (a && a == b)
> +
> +/*
> + * bpf_landlock_cmp_fs_prop_with_struct_file
> + *
> + * Cf. include/uapi/linux/bpf.h
> + */
> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
> +   u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
> +{
> +   u8 property = (u8) r1_property;
> +   struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> +   enum bpf_map_array_op map_op = r3_map_op;
> +   struct file *file = (struct file *) (unsigned long) r4_file;
> +   struct bpf_array *array = container_of(map, struct bpf_array, map);
> +   struct path *p1, *p2;
> +   struct map_landlock_handle *handle;
> +   int i;
> +
> +   /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
> +   if (unlikely(!map)) {
> +   WARN_ON(1);
> +   return -EFAULT;
> +   }

Just some minor style/readability nits...

This is more readable as:

if (WARN_ON(!map))
return -EFAULT;

(WARN_ON already includes the unlikely() and passes through the test result.)

> +   if (unlikely(!file))
> +   return -ENOENT;
> +   if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != 
> _LANDLOCK_FLAG_FS_MASK))
> +   return -EINVAL;
> +
> +   /* for now, only handle OP_OR */
> +   switch (map_op) {
> +   case BPF_MAP_ARRAY_OP_OR:
> +   break;
> +   case BPF_MAP_ARRAY_OP_UNSPEC:
> +   case BPF_MAP_ARRAY_OP_AND:
> +   case BPF_MAP_ARRAY_OP_XOR:
> +   default:
> +  

Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles

2016-10-03 Thread Kees Cook
On Wed, Sep 14, 2016 at 12:23 AM, Mickaël Salaün <m...@digikod.net> wrote:
> This new arraymap looks like a set and brings new properties:
> * strong typing of entries: the eBPF functions get the array type of
>   elements instead of CONST_PTR_TO_MAP (e.g.
>   CONST_PTR_TO_LANDLOCK_HANDLE_FS);
> * force sequential filling (i.e. replace or append-only update), which
>   allow quick browsing of all entries.
>
> This strong typing is useful to statically check if the content of a map
> can be passed to an eBPF function. For example, Landlock use it to store
> and manage kernel objects (e.g. struct file) instead of dealing with
> userland raw data. This improve efficiency and ensure that an eBPF
> program can only call functions with the right high-level arguments.
>
> The enum bpf_map_handle_type list low-level types (e.g.
> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
> updating a map entry (handle). This handle types are used to infer a
> high-level arraymap type which are listed in enum bpf_map_array_type
> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
>
> For now, this new arraymap is only used by Landlock LSM (cf. next
> commits) but it could be useful for other needs.
>
> Changes since v2:
> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
>   handle entries (suggested by Andy Lutomirski)
> * remove useless checks
>
> Changes since v1:
> * arraymap of handles replace custom checker groups
> * simpler userland API
>
> Signed-off-by: Mickaël Salaün <m...@digikod.net>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: Andy Lutomirski <l...@amacapital.net>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: David S. Miller <da...@davemloft.net>
> Cc: Kees Cook <keesc...@chromium.org>
> Link: 
> https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com
> ---
>  include/linux/bpf.h  |  14 
>  include/uapi/linux/bpf.h |  18 +
>  kernel/bpf/arraymap.c| 203 
> +++
>  kernel/bpf/verifier.c|  12 ++-
>  4 files changed, 246 insertions(+), 1 deletion(-)
>
> [...]
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index a2ac051c342f..94256597eacd 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> [...]
> +   /*
> +* Limit number of entries in an arraymap of handles to the maximum
> +* number of open files for the current process. The maximum number of
> +* handle entries (including all arraymaps) for a process is then
> +* (RLIMIT_NOFILE - 1) * RLIMIT_NOFILE. If the process' RLIMIT_NOFILE
> +* is 0, then any entry update is forbidden.
> +*
> +* An eBPF program can inherit all the arraymap FD. The worse case is
> +* to fill a bunch of arraymaps, create an eBPF program, close the
> +* arraymap FDs, and start again. The maximum number of arraymap
> +* entries can then be close to RLIMIT_NOFILE^3.
> +*
> +* FIXME: This should be improved... any idea?
> +*/
> +   if (unlikely(index >= rlimit(RLIMIT_NOFILE)))
> +   return -EMFILE;

I'm not sure what's best for resource management here. Landlock will
be holding open path structs, for example, but how are you expecting
to track things like network policies? An allowed IP address, for
example, doesn't have a handle outside of doing a full
socket()/connect() setup.

I think an explicit design for resource management should be
considered up front...

-Kees

-- 
Kees Cook
Nexus Security


Re: [RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup

2016-10-03 Thread Kees Cook
On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <m...@digikod.net> wrote:
> This allows to add new eBPF programs to Landlock hooks dedicated to a
> cgroup thanks to the BPF_PROG_ATTACH command. Like for socket eBPF
> programs, the Landlock hooks attached to a cgroup are propagated to the
> nested cgroups. However, when a new Landlock program is attached to one
> of this nested cgroup, this cgroup hierarchy fork the Landlock hooks.
> This design is simple and match the current CONFIG_BPF_CGROUP
> inheritance. The difference lie in the fact that Landlock programs can
> only be stacked but not removed. This match the append-only seccomp
> behavior. Userland is free to handle Landlock hooks attached to a cgroup
> in more complicated ways (e.g. continuous inheritance), but care should
> be taken to properly handle error cases (e.g. memory allocation errors).
>
> Changes since v2:
> * new design based on BPF_PROG_ATTACH (suggested by Alexei Starovoitov)
>
> Signed-off-by: Mickaël Salaün <m...@digikod.net>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: Andy Lutomirski <l...@amacapital.net>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: Daniel Mack <dan...@zonque.org>
> Cc: David S. Miller <da...@davemloft.net>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Tejun Heo <t...@kernel.org>
> Link: https://lkml.kernel.org/r/20160826021432.ga8...@ast-mbp.thefacebook.com
> Link: https://lkml.kernel.org/r/20160827204307.ga43...@ast-mbp.thefacebook.com
> ---
>  include/linux/bpf-cgroup.h  |  7 +++
>  include/linux/cgroup-defs.h |  2 ++
>  include/linux/landlock.h|  9 +
>  include/uapi/linux/bpf.h|  1 +
>  kernel/bpf/cgroup.c | 33 ++---
>  kernel/bpf/syscall.c| 11 +++
>  security/landlock/lsm.c | 40 +++-
>  security/landlock/manager.c | 32 
>  8 files changed, 131 insertions(+), 4 deletions(-)
>
> [...]
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 7b75fa692617..1c18fe46958a 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
>  EXPORT_SYMBOL(cgroup_bpf_enabled_key);
> @@ -31,7 +32,15 @@ void cgroup_bpf_put(struct cgroup *cgrp)
> union bpf_object pinned = cgrp->bpf.pinned[type];
>
> if (pinned.prog) {
> -   bpf_prog_put(pinned.prog);
> +   switch (type) {
> +   case BPF_CGROUP_LANDLOCK:
> +#ifdef CONFIG_SECURITY_LANDLOCK
> +   put_landlock_hooks(pinned.hooks);
> +   break;
> +#endif /* CONFIG_SECURITY_LANDLOCK */
> +   default:
> +   bpf_prog_put(pinned.prog);
> +   }
> static_branch_dec(_bpf_enabled_key);
> }
> }

I get creeped out by type-controlled unions of pointers. :P I don't
have a suggestion to improve this, but I don't like seeing a pointer
type managed separately from the pointer itself as it tends to bypass
a lot of both static and dynamic checking. A union is better than a
cast of void *, but it still worries me. :)

-Kees

-- 
Kees Cook
Nexus Security


Re: [RFC v3 19/22] landlock: Add interrupted origin

2016-10-03 Thread Kees Cook
On Wed, Sep 14, 2016 at 6:19 PM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Wed, Sep 14, 2016 at 3:14 PM, Mickaël Salaün <m...@digikod.net> wrote:
>>
>> On 14/09/2016 20:29, Andy Lutomirski wrote:
>>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <m...@digikod.net> wrote:
>>>> This third origin of hook call should cover all possible trigger paths
>>>> (e.g. page fault). Landlock eBPF programs can then take decisions
>>>> accordingly.
>>>>
>>>> Signed-off-by: Mickaël Salaün <m...@digikod.net>
>>>> Cc: Alexei Starovoitov <a...@kernel.org>
>>>> Cc: Andy Lutomirski <l...@amacapital.net>
>>>> Cc: Daniel Borkmann <dan...@iogearbox.net>
>>>> Cc: Kees Cook <keesc...@chromium.org>
>>>> ---
>>>
>>>
>>>>
>>>> +   if (unlikely(in_interrupt())) {
>>>
>>> IMO security hooks have no business being called from interrupts.
>>> Aren't they all synchronous things done by tasks?  Interrupts are
>>> driver things.
>>>
>>> Are you trying to check for page faults and such?
>>
>> Yes, that was the idea you did put in my mind. Not sure how to deal with
>> this.
>>
>
> It's not so easy, unfortunately.  The easiest reliable way might be to
> set a TS_ flag on all syscall entries when TIF_SECCOMP or similar is
> set.

For making this series smaller, let's leave the idea idea of interrupt
hooks out -- the intention is for stricter syscall filtering, yes?

Once things are more well established and there's a use-case for this,
it can be added back in.

-Kees


-- 
Kees Cook
Nexus Security


Re: [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy

2016-10-03 Thread Kees Cook
On Wed, Sep 14, 2016 at 3:34 PM, Mickaël Salaün <m...@digikod.net> wrote:
>
> On 14/09/2016 20:43, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <m...@digikod.net> wrote:
>>> A Landlock program will be triggered according to its subtype/origin
>>> bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the
>>> Landlock program when a seccomp filter will return RET_LANDLOCK.
>>> Moreover, it is possible to return a 16-bit cookie which will be
>>> readable by the Landlock programs in its context.
>>
>> Are you envisioning that the filters will return RET_LANDLOCK most of
>> the time or rarely?  If it's most of the time, then maybe this could
>> be simplified a bit by unconditionally calling the landlock filter and
>> letting the landlock filter access a struct seccomp_data if needed.
>
> Exposing seccomp_data in a Landlock context may be a good idea. The main
> implication is that Landlock programs may then be architecture specific
> (if dealing with data) as seccomp filters are. Another point is that it
> remove any direct binding between seccomp filters and Landlock programs.
> I will try this (more simple) approach.

Yeah, I would prefer that the seccomp code isn't doing list management
to identify the landlock hooks to trigger, etc. I think that's better
done on the LSM side. And since multiple seccomp filters could trigger
landlock, it may be best to just leave the low 16 bits unused
entirely. Then all state management is handled by the landlock eBPF
maps, not a value coming from seccomp that can get stomped on by new
filters, etc.

-Kees

-- 
Kees Cook
Nexus Security


[PATCH] isdn: Constify some function parameters

2016-12-16 Thread Kees Cook
From: Emese Revfy <re.em...@gmail.com>

The coming initify gcc plugin expects const pointer types, and caught
some __printf arguments that weren't const yet. This fixes those.

Signed-off-by: Emese Revfy <re.em...@gmail.com>
[kees: expanded commit message]
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/isdn/hisax/config.c | 16 
 drivers/isdn/hisax/hisax.h  |  4 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/isdn/hisax/config.c b/drivers/isdn/hisax/config.c
index bf04d2a3cf4a..2d12c6ceeb89 100644
--- a/drivers/isdn/hisax/config.c
+++ b/drivers/isdn/hisax/config.c
@@ -659,7 +659,7 @@ int jiftime(char *s, long mark)
 
 static u_char tmpbuf[HISAX_STATUS_BUFSIZE];
 
-void VHiSax_putstatus(struct IsdnCardState *cs, char *head, char *fmt,
+void VHiSax_putstatus(struct IsdnCardState *cs, char *head, const char *fmt,
  va_list args)
 {
/* if head == NULL the fmt contains the full info */
@@ -669,23 +669,24 @@ void VHiSax_putstatus(struct IsdnCardState *cs, char 
*head, char *fmt,
u_char  *p;
isdn_ctrl   ic;
int len;
+   const u_char*data;
 
if (!cs) {
printk(KERN_WARNING "HiSax: No CardStatus for message");
return;
}
spin_lock_irqsave(>statlock, flags);
-   p = tmpbuf;
if (head) {
+   p = tmpbuf;
p += jiftime(p, jiffies);
p += sprintf(p, " %s", head);
p += vsprintf(p, fmt, args);
*p++ = '\n';
*p = 0;
len = p - tmpbuf;
-   p = tmpbuf;
+   data = tmpbuf;
} else {
-   p = fmt;
+   data = fmt;
len = strlen(fmt);
}
if (len > HISAX_STATUS_BUFSIZE) {
@@ -699,13 +700,12 @@ void VHiSax_putstatus(struct IsdnCardState *cs, char 
*head, char *fmt,
if (i >= len)
i = len;
len -= i;
-   memcpy(cs->status_write, p, i);
+   memcpy(cs->status_write, data, i);
cs->status_write += i;
if (cs->status_write > cs->status_end)
cs->status_write = cs->status_buf;
-   p += i;
if (len) {
-   memcpy(cs->status_write, p, len);
+   memcpy(cs->status_write, data + i, len);
cs->status_write += len;
}
 #ifdef KERNELSTACK_DEBUG
@@ -729,7 +729,7 @@ void VHiSax_putstatus(struct IsdnCardState *cs, char *head, 
char *fmt,
}
 }
 
-void HiSax_putstatus(struct IsdnCardState *cs, char *head, char *fmt, ...)
+void HiSax_putstatus(struct IsdnCardState *cs, char *head, const char *fmt, 
...)
 {
va_list args;
 
diff --git a/drivers/isdn/hisax/hisax.h b/drivers/isdn/hisax/hisax.h
index 6ead6314e6d2..338d0408b377 100644
--- a/drivers/isdn/hisax/hisax.h
+++ b/drivers/isdn/hisax/hisax.h
@@ -1288,9 +1288,9 @@ int jiftime(char *s, long mark);
 int HiSax_command(isdn_ctrl *ic);
 int HiSax_writebuf_skb(int id, int chan, int ack, struct sk_buff *skb);
 __printf(3, 4)
-void HiSax_putstatus(struct IsdnCardState *cs, char *head, char *fmt, ...);
+void HiSax_putstatus(struct IsdnCardState *cs, char *head, const char *fmt, 
...);
 __printf(3, 0)
-void VHiSax_putstatus(struct IsdnCardState *cs, char *head, char *fmt, va_list 
args);
+void VHiSax_putstatus(struct IsdnCardState *cs, char *head, const char *fmt, 
va_list args);
 void HiSax_reportcard(int cardnr, int sel);
 int QuickHex(char *txt, u_char *p, int cnt);
 void LogFrame(struct IsdnCardState *cs, u_char *p, int size);
-- 
2.7.4


-- 
Kees Cook
Nexus Security


Re: [PATCH] net: use designated initializers

2016-12-17 Thread Kees Cook
On Sat, Dec 17, 2016 at 8:57 AM, David Miller <da...@davemloft.net> wrote:
> From: Kees Cook <keesc...@chromium.org>
> Date: Fri, 16 Dec 2016 16:58:58 -0800
>
>> Prepare to mark sensitive kernel structures for randomization by making
>> sure they're using designated initializers. These were identified during
>> allyesconfig builds of x86, arm, and arm64, with most initializer fixes
>> extracted from grsecurity.
>>
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>
> Applied, although "decnet: " would have been a much better
> subsystem prefix.

Thanks! Yeah, I had corrected that in my tree already in case there
was a v2 needed. I was working off an auto-splitting script that I
taught to guess at prefixes by looking at commit logs. It didn't work
very well. ;) We need another field in MAINTAINERS. :)

-Kees

-- 
Kees Cook
Nexus Security


Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants

2017-01-12 Thread Kees Cook
On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mho...@kernel.org> wrote:
> From: Michal Hocko <mho...@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidef...@de.ibm.com>
> Cc: Heiko Carstens <heiko.carst...@de.ibm.com>
> Cc: Herbert Xu <herb...@gondor.apana.org.au>
> Cc: Anton Vorontsov <an...@enomsg.org>
> Cc: Colin Cross <ccr...@android.com>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Tony Luck <tony.l...@intel.com>
> Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
> Cc: Ben Skeggs <bske...@redhat.com>
> Cc: Kent Overstreet <kent.overstr...@gmail.com>
> Cc: Santosh Raspatur <sant...@chelsio.com>
> Cc: Hariprasad S <haripra...@chelsio.com>
> Cc: Tariq Toukan <tar...@mellanox.com>
> Cc: Yishai Hadas <yish...@mellanox.com>
> Cc: Dan Williams <dan.j.willi...@intel.com>
> Cc: Oleg Drokin <oleg.dro...@intel.com>
> Cc: Andreas Dilger <andreas.dil...@intel.com>
> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Cc: David Sterba <dste...@suse.com>
> Cc: "Yan, Zheng" <z...@redhat.com>
> Cc: Ilya Dryomov <idryo...@gmail.com>
> Cc: Alexander Viro <v...@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: Eric Dumazet <eric.duma...@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mho...@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c   | 10 ++-
>  crypto/lzo.c   |  4 +--
>  drivers/acpi/apei/erst.c   |  8 ++---
>  drivers/char/agp/generic.c |  8 +
>  drivers/gpu/drm/nouveau/nouveau_gem.c  |  4 +--
>  drivers/md/bcache/util.h   | 12 ++--
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h|  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c   |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 31 
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c |  9 ++
>  drivers/net/ethernet/mellanox/mlx4/mr.c|  9 ++
>  drivers/nvdimm/dimm_devs.c |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +--
>  drivers/xen/evtchn.c   | 14 +
>  fs/btrfs/ctree.c   |  9 ++
>  fs/btrfs/ioctl.c   |  9 ++
>  fs/btrfs/send.c| 27 ++---
>  fs/ceph/file.c |  9 ++
>  fs/select.c|  5 +---
>  fs/xattr.c | 27 ++---
>  kernel/bpf/hashtab.c   | 11 ++-
>  lib/iov_iter.c |  5 +---
>  mm/frame_vector.c  |  5 +---
>  net/ipv4/inet_hashtables.c |  6 +---
>  net/ipv4/tcp_metrics.c |  5 +---
>  net/mpls/af_mpls.c |  5 +---
>  net/netfilter/x_tables.c   | 34 
> ++
>  net/netfilter/xt_recent.c  |  5 +---
>  net/sched/sch_choke.c  |  5 +---
>  net/sched/sch_fq_codel.c   | 26 -
>  net/sched/sch_hhf.c| 33 ++---
>  net/sched/sch_netem.c  |  6 +---
>  net/sched/sch_sfq.c|  6 +---
>  security/keys/keyctl.c | 22 --
>  35 files changed, 96 insertions(+), 319 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s3

[PATCH] net: ping: check minimum size on ICMP header length

2016-12-02 Thread Kees Cook
Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there
was no check that the iovec contained enough bytes for a icmp header,
and the read loop would walk across neighboring stack contents. Since
the iov_iter conversion, bad arguments are noticed, but the returned
error is EFAULT. Returning EMSGSIZE is a clearer fix and solves the
problem prior to v3.19.

This was found using trinity with KASAN on v3.18:

BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr 
ffc071077da0
Read of size 8 by task trinity-c2/9623
page:ffbe034b9a08 count:0 mapcount:0 mapping:  (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: GBU 3.18.0-dirty #15
Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
Call trace:
[] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
[] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
[< inline >] __dump_stack lib/dump_stack.c:15
[] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
[< inline >] print_address_description mm/kasan/report.c:147
[< inline >] kasan_report_error mm/kasan/report.c:236
[] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
[< inline >] check_memory_region mm/kasan/kasan.c:264
[] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
[] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
[< inline >] memcpy_from_msg include/linux/skbuff.h:2667
[] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
[] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
[] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
[< inline >] __sock_sendmsg_nosec net/socket.c:624
[< inline >] __sock_sendmsg net/socket.c:632
[] sock_sendmsg+0x124/0x164 net/socket.c:643
[< inline >] SYSC_sendto net/socket.c:1797
[] SyS_sendto+0x178/0x1d8 net/socket.c:1761

CVE-2016-8399

Reported-by: Qidan He <i...@flanker017.me>
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 net/ipv4/ping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 205e2000d395..8257be3f032c 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -654,7 +654,7 @@ int ping_common_sendmsg(int family, struct msghdr *msg, 
size_t len,
void *user_icmph, size_t icmph_len) {
u8 type, code;
 
-   if (len > 0x)
+   if (len > 0xFFFF || len < icmph_len)
return -EMSGSIZE;
 
/*
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH v2] net: ping: check minimum size on ICMP header length

2016-12-05 Thread Kees Cook
Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there
was no check that the iovec contained enough bytes for an ICMP header,
and the read loop would walk across neighboring stack contents. Since the
iov_iter conversion, bad arguments are noticed, but the returned error is
EFAULT. Returning EINVAL is a clearer error and also solves the problem
prior to v3.19.

This was found using trinity with KASAN on v3.18:

BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr 
ffc071077da0
Read of size 8 by task trinity-c2/9623
page:ffbe034b9a08 count:0 mapcount:0 mapping:  (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: GBU 3.18.0-dirty #15
Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
Call trace:
[] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
[] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
[< inline >] __dump_stack lib/dump_stack.c:15
[] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
[< inline >] print_address_description mm/kasan/report.c:147
[< inline >] kasan_report_error mm/kasan/report.c:236
[] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
[< inline >] check_memory_region mm/kasan/kasan.c:264
[] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
[] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
[< inline >] memcpy_from_msg include/linux/skbuff.h:2667
[] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
[] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
[] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
[< inline >] __sock_sendmsg_nosec net/socket.c:624
[< inline >] __sock_sendmsg net/socket.c:632
[] sock_sendmsg+0x124/0x164 net/socket.c:643
[< inline >] SYSC_sendto net/socket.c:1797
[] SyS_sendto+0x178/0x1d8 net/socket.c:1761

CVE-2016-8399

Reported-by: Qidan He <i...@flanker017.me>
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
v2: return -EINVAL, Lorenzo.
---
 net/ipv4/ping.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 205e2000d395..96b8e2b95731 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -657,6 +657,10 @@ int ping_common_sendmsg(int family, struct msghdr *msg, 
size_t len,
if (len > 0x)
return -EMSGSIZE;
 
+   /* Must have at least a full ICMP header. */
+   if (len < icmph_len)
+   return -EINVAL;
+
/*
 *  Check the flags.
 */
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH] isdn: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/isdn/i4l/isdn_concap.c   |  6 +++---
 drivers/isdn/i4l/isdn_x25iface.c | 16 
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/isdn/i4l/isdn_concap.c b/drivers/isdn/i4l/isdn_concap.c
index 91d57304d4d3..336523ec077c 100644
--- a/drivers/isdn/i4l/isdn_concap.c
+++ b/drivers/isdn/i4l/isdn_concap.c
@@ -80,9 +80,9 @@ static int isdn_concap_dl_disconn_req(struct concap_proto 
*concap)
 }
 
 struct concap_device_ops isdn_concap_reliable_dl_dops = {
-   _concap_dl_data_req,
-   _concap_dl_connect_req,
-   _concap_dl_disconn_req
+   .data_req = _concap_dl_data_req,
+   .connect_req = _concap_dl_connect_req,
+   .disconn_req = _concap_dl_disconn_req
 };
 
 /* The following should better go into a dedicated source file such that
diff --git a/drivers/isdn/i4l/isdn_x25iface.c b/drivers/isdn/i4l/isdn_x25iface.c
index 0c5d8de41b23..ba60076e0b95 100644
--- a/drivers/isdn/i4l/isdn_x25iface.c
+++ b/drivers/isdn/i4l/isdn_x25iface.c
@@ -53,14 +53,14 @@ static int isdn_x25iface_disconn_ind(struct concap_proto *);
 
 
 static struct concap_proto_ops ix25_pops = {
-   _x25iface_proto_new,
-   _x25iface_proto_del,
-   _x25iface_proto_restart,
-   _x25iface_proto_close,
-   _x25iface_xmit,
-   _x25iface_receive,
-   _x25iface_connect_ind,
-   _x25iface_disconn_ind
+   .proto_new = _x25iface_proto_new,
+   .proto_del = _x25iface_proto_del,
+   .restart = _x25iface_proto_restart,
+   .close = _x25iface_proto_close,
+   .encap_and_xmit = _x25iface_xmit,
+   .data_ind = _x25iface_receive,
+   .connect_ind = _x25iface_connect_ind,
+   .disconn_ind = _x25iface_disconn_ind
 };
 
 /* error message helper function */
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH] bna: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/net/ethernet/brocade/bna/bna_enet.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bna_enet.c 
b/drivers/net/ethernet/brocade/bna/bna_enet.c
index 4e5c3874a50f..bba81735ce87 100644
--- a/drivers/net/ethernet/brocade/bna/bna_enet.c
+++ b/drivers/net/ethernet/brocade/bna/bna_enet.c
@@ -1676,10 +1676,10 @@ bna_cb_ioceth_reset(void *arg)
 }
 
 static struct bfa_ioc_cbfn bna_ioceth_cbfn = {
-   bna_cb_ioceth_enable,
-   bna_cb_ioceth_disable,
-   bna_cb_ioceth_hbfail,
-   bna_cb_ioceth_reset
+   .enable_cbfn = bna_cb_ioceth_enable,
+   .disable_cbfn = bna_cb_ioceth_disable,
+   .hbfail_cbfn = bna_cb_ioceth_hbfail,
+   .reset_cbfn = bna_cb_ioceth_reset
 };
 
 static void bna_attr_init(struct bna_ioceth *ioceth)
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH] ATM: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 net/atm/lec.c|  6 ++--
 net/atm/mpoa_caches.c| 43 ++--
 net/vmw_vsock/vmci_transport_notify.c| 30 +--
 net/vmw_vsock/vmci_transport_notify_qstate.c | 30 +--
 4 files changed, 54 insertions(+), 55 deletions(-)

diff --git a/net/atm/lec.c b/net/atm/lec.c
index 779b3fa6052d..019557d0a11d 100644
--- a/net/atm/lec.c
+++ b/net/atm/lec.c
@@ -111,9 +111,9 @@ static inline void lec_arp_put(struct lec_arp_table *entry)
 }
 
 static struct lane2_ops lane2_ops = {
-   lane2_resolve,  /* resolve, spec 3.1.3 */
-   lane2_associate_req,/* associate_req,   spec 3.1.4 */
-   NULL/* associate indicator, spec 3.1.5 */
+   .resolve = lane2_resolve,   /* spec 3.1.3 */
+   .associate_req = lane2_associate_req,   /* spec 3.1.4 */
+   .associate_indicator = NULL /* spec 3.1.5 */
 };
 
 static unsigned char bus_mac[ETH_ALEN] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff 
};
diff --git a/net/atm/mpoa_caches.c b/net/atm/mpoa_caches.c
index 9e60e74c807d..a89fdebeffda 100644
--- a/net/atm/mpoa_caches.c
+++ b/net/atm/mpoa_caches.c
@@ -535,33 +535,32 @@ static void eg_destroy_cache(struct mpoa_client *mpc)
 
 
 static const struct in_cache_ops ingress_ops = {
-   in_cache_add_entry,   /* add_entry   */
-   in_cache_get, /* get */
-   in_cache_get_with_mask,   /* get_with_mask   */
-   in_cache_get_by_vcc,  /* get_by_vcc  */
-   in_cache_put, /* put */
-   in_cache_remove_entry,/* remove_entry*/
-   cache_hit,/* cache_hit   */
-   clear_count_and_expired,  /* clear_count */
-   check_resolving_entries,  /* check_resolving */
-   refresh_entries,  /* refresh */
-   in_destroy_cache  /* destroy_cache   */
+   .add_entry = in_cache_add_entry,
+   .get = in_cache_get,
+   .get_with_mask = in_cache_get_with_mask,
+   .get_by_vcc = in_cache_get_by_vcc,
+   .put = in_cache_put,
+   .remove_entry = in_cache_remove_entry,
+   .cache_hit = cache_hit,
+   .clear_count = clear_count_and_expired,
+   .check_resolving = check_resolving_entries,
+   .refresh = refresh_entries,
+   .destroy_cache = in_destroy_cache
 };
 
 static const struct eg_cache_ops egress_ops = {
-   eg_cache_add_entry,   /* add_entry*/
-   eg_cache_get_by_cache_id, /* get_by_cache_id  */
-   eg_cache_get_by_tag,  /* get_by_tag   */
-   eg_cache_get_by_vcc,  /* get_by_vcc   */
-   eg_cache_get_by_src_ip,   /* get_by_src_ip*/
-   eg_cache_put, /* put  */
-   eg_cache_remove_entry,/* remove_entry */
-   update_eg_cache_entry,/* update   */
-   clear_expired,/* clear_expired*/
-   eg_destroy_cache  /* destroy_cache*/
+   .add_entry = eg_cache_add_entry,
+   .get_by_cache_id = eg_cache_get_by_cache_id,
+   .get_by_tag = eg_cache_get_by_tag,
+   .get_by_vcc = eg_cache_get_by_vcc,
+   .get_by_src_ip = eg_cache_get_by_src_ip,
+   .put = eg_cache_put,
+   .remove_entry = eg_cache_remove_entry,
+   .update = update_eg_cache_entry,
+   .clear_expired = clear_expired,
+   .destroy_cache = eg_destroy_cache
 };
 
-
 void atm_mpoa_init_cache(struct mpoa_client *mpc)
 {
mpc->in_ops = _ops;
diff --git a/net/vmw_vsock/vmci_transport_notify.c 
b/net/vmw_vsock/vmci_transport_notify.c
index fd8cf0214d51..1406db4d97d1 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -662,19 +662,19 @@ static void 
vmci_transport_notify_pkt_process_negotiate(struct sock *sk)
 
 /* Socket control packet based operations. */
 const struct vmci_transport_notify_ops vmci_transport_notify_pkt_ops = {
-   vmci_transport_notify_pkt_socket_init,
-   vmci_transport_notify_pkt_socket_destruct,
-   vmci_transport_notify_pkt_poll_in,
-   vmci_transport_notify_pkt_poll_out,
-   vmci_transport_notify_pkt_handle_pkt,
-   vmci_transport_notify_pkt_recv_init,
-   vmci_transport_notify_pkt_recv_pre_block,
-   vmci_transport_notify_pkt_recv_pre_dequeue,
-   vmci_transport_notify_pkt_recv_post_dequeue,
-   vmci_transport_notify_pkt_send_init,
-   vmci_transport_notify_pkt_send_

[PATCH] net: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 net/decnet/dn_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index b2c26b081134..41f803e35da3 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -201,7 +201,7 @@ static struct dn_dev_sysctl_table {
.extra1 = _t3,
.extra2 = _t3
},
-   {0}
+   { }
},
 };
 
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH] isdn/gigaset: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/isdn/gigaset/bas-gigaset.c | 32 
 drivers/isdn/gigaset/ser-gigaset.c | 32 
 drivers/isdn/gigaset/usb-gigaset.c | 32 
 3 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/drivers/isdn/gigaset/bas-gigaset.c 
b/drivers/isdn/gigaset/bas-gigaset.c
index aecec6d32463..11e13c56126f 100644
--- a/drivers/isdn/gigaset/bas-gigaset.c
+++ b/drivers/isdn/gigaset/bas-gigaset.c
@@ -2565,22 +2565,22 @@ static int gigaset_post_reset(struct usb_interface 
*intf)
 
 
 static const struct gigaset_ops gigops = {
-   gigaset_write_cmd,
-   gigaset_write_room,
-   gigaset_chars_in_buffer,
-   gigaset_brkchars,
-   gigaset_init_bchannel,
-   gigaset_close_bchannel,
-   gigaset_initbcshw,
-   gigaset_freebcshw,
-   gigaset_reinitbcshw,
-   gigaset_initcshw,
-   gigaset_freecshw,
-   gigaset_set_modem_ctrl,
-   gigaset_baud_rate,
-   gigaset_set_line_ctrl,
-   gigaset_isoc_send_skb,
-   gigaset_isoc_input,
+   .write_cmd = gigaset_write_cmd,
+   .write_room = gigaset_write_room,
+   .chars_in_buffer = gigaset_chars_in_buffer,
+   .brkchars = gigaset_brkchars,
+   .init_bchannel = gigaset_init_bchannel,
+   .close_bchannel = gigaset_close_bchannel,
+   .initbcshw = gigaset_initbcshw,
+   .freebcshw = gigaset_freebcshw,
+   .reinitbcshw = gigaset_reinitbcshw,
+   .initcshw = gigaset_initcshw,
+   .freecshw = gigaset_freecshw,
+   .set_modem_ctrl = gigaset_set_modem_ctrl,
+   .baud_rate = gigaset_baud_rate,
+   .set_line_ctrl = gigaset_set_line_ctrl,
+   .send_skb = gigaset_isoc_send_skb,
+   .handle_input = gigaset_isoc_input,
 };
 
 /* bas_gigaset_init
diff --git a/drivers/isdn/gigaset/ser-gigaset.c 
b/drivers/isdn/gigaset/ser-gigaset.c
index b90776ef56ec..ab0b63a4d045 100644
--- a/drivers/isdn/gigaset/ser-gigaset.c
+++ b/drivers/isdn/gigaset/ser-gigaset.c
@@ -445,22 +445,22 @@ static int gigaset_set_line_ctrl(struct cardstate *cs, 
unsigned cflag)
 }
 
 static const struct gigaset_ops ops = {
-   gigaset_write_cmd,
-   gigaset_write_room,
-   gigaset_chars_in_buffer,
-   gigaset_brkchars,
-   gigaset_init_bchannel,
-   gigaset_close_bchannel,
-   gigaset_initbcshw,
-   gigaset_freebcshw,
-   gigaset_reinitbcshw,
-   gigaset_initcshw,
-   gigaset_freecshw,
-   gigaset_set_modem_ctrl,
-   gigaset_baud_rate,
-   gigaset_set_line_ctrl,
-   gigaset_m10x_send_skb,  /* asyncdata.c */
-   gigaset_m10x_input, /* asyncdata.c */
+   .write_cmd = gigaset_write_cmd,
+   .write_room = gigaset_write_room,
+   .chars_in_buffer = gigaset_chars_in_buffer,
+   .brkchars = gigaset_brkchars,
+   .init_bchannel = gigaset_init_bchannel,
+   .close_bchannel = gigaset_close_bchannel,
+   .initbcshw = gigaset_initbcshw,
+   .freebcshw = gigaset_freebcshw,
+   .reinitbcshw = gigaset_reinitbcshw,
+   .initcshw = gigaset_initcshw,
+   .freecshw = gigaset_freecshw,
+   .set_modem_ctrl = gigaset_set_modem_ctrl,
+   .baud_rate = gigaset_baud_rate,
+   .set_line_ctrl = gigaset_set_line_ctrl,
+   .send_skb = gigaset_m10x_send_skb,  /* asyncdata.c */
+   .handle_input = gigaset_m10x_input, /* asyncdata.c */
 };
 
 
diff --git a/drivers/isdn/gigaset/usb-gigaset.c 
b/drivers/isdn/gigaset/usb-gigaset.c
index 5f306e2eece5..eade36dafa34 100644
--- a/drivers/isdn/gigaset/usb-gigaset.c
+++ b/drivers/isdn/gigaset/usb-gigaset.c
@@ -862,22 +862,22 @@ static int gigaset_pre_reset(struct usb_interface *intf)
 }
 
 static const struct gigaset_ops ops = {
-   gigaset_write_cmd,
-   gigaset_write_room,
-   gigaset_chars_in_buffer,
-   gigaset_brkchars,
-   gigaset_init_bchannel,
-   gigaset_close_bchannel,
-   gigaset_initbcshw,
-   gigaset_freebcshw,
-   gigaset_reinitbcshw,
-   gigaset_initcshw,
-   gigaset_freecshw,
-   gigaset_set_modem_ctrl,
-   gigaset_baud_rate,
-   gigaset_set_line_ctrl,
-   gigaset_m10x_send_skb,
-   gigaset_m10x_input,
+   .write_cmd = gigaset_write_cmd,
+   .write_room = gigaset_write_room,
+   .chars_in_buffer = gigaset_chars_in_buffer,
+   .brkchars = gigaset_brkchars,
+   .init_bchannel = gigaset_init_bchannel,
+   .close_bchannel = gigaset_close_bchannel,
+   .initbcshw = gigaset_initbcshw,
+   .freebcshw = gigaset_freebcshw,
+   .reinitbcshw = gigaset_reinitbcshw,
+   .initcshw = gigaset_initcshw,
+   .freecshw = gigaset_freecshw,
+   .set_mode

[PATCH] WAN: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/net/wan/lmc/lmc_media.c | 97 +
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/drivers/net/wan/lmc/lmc_media.c b/drivers/net/wan/lmc/lmc_media.c
index 5920c996fcdf..ff2e4a5654c7 100644
--- a/drivers/net/wan/lmc/lmc_media.c
+++ b/drivers/net/wan/lmc/lmc_media.c
@@ -95,62 +95,63 @@ static inline void write_av9110_bit (lmc_softc_t *, int);
 static void write_av9110(lmc_softc_t *, u32, u32, u32, u32, u32);
 
 lmc_media_t lmc_ds3_media = {
-  lmc_ds3_init,/* special media init stuff */
-  lmc_ds3_default, /* reset to default state */
-  lmc_ds3_set_status,  /* reset status to state provided */
-  lmc_dummy_set_1, /* set clock source */
-  lmc_dummy_set2_1,/* set line speed */
-  lmc_ds3_set_100ft,   /* set cable length */
-  lmc_ds3_set_scram,   /* set scrambler */
-  lmc_ds3_get_link_status, /* get link status */
-  lmc_dummy_set_1, /* set link status */
-  lmc_ds3_set_crc_length,  /* set CRC length */
-  lmc_dummy_set_1, /* set T1 or E1 circuit type */
-  lmc_ds3_watchdog
+  .init = lmc_ds3_init,/* special media init 
stuff */
+  .defaults = lmc_ds3_default, /* reset to default state */
+  .set_status = lmc_ds3_set_status,/* reset status to state 
provided */
+  .set_clock_source = lmc_dummy_set_1, /* set clock source */
+  .set_speed = lmc_dummy_set2_1,   /* set line speed */
+  .set_cable_length = lmc_ds3_set_100ft,   /* set cable length */
+  .set_scrambler = lmc_ds3_set_scram,  /* set scrambler */
+  .get_link_status = lmc_ds3_get_link_status,  /* get link status */
+  .set_link_status = lmc_dummy_set_1,  /* set link status */
+  .set_crc_length = lmc_ds3_set_crc_length,/* set CRC length */
+  .set_circuit_type = lmc_dummy_set_1, /* set T1 or E1 circuit type */
+  .watchdog = lmc_ds3_watchdog
 };
 
 lmc_media_t lmc_hssi_media = {
-  lmc_hssi_init,   /* special media init stuff */
-  lmc_hssi_default,/* reset to default state */
-  lmc_hssi_set_status, /* reset status to state provided */
-  lmc_hssi_set_clock,  /* set clock source */
-  lmc_dummy_set2_1,/* set line speed */
-  lmc_dummy_set_1, /* set cable length */
-  lmc_dummy_set_1, /* set scrambler */
-  lmc_hssi_get_link_status,/* get link status */
-  lmc_hssi_set_link_status,/* set link status */
-  lmc_hssi_set_crc_length, /* set CRC length */
-  lmc_dummy_set_1, /* set T1 or E1 circuit type */
-  lmc_hssi_watchdog
+  .init = lmc_hssi_init,   /* special media init stuff */
+  .defaults = lmc_hssi_default,/* reset to default 
state */
+  .set_status = lmc_hssi_set_status,   /* reset status to state 
provided */
+  .set_clock_source = lmc_hssi_set_clock,  /* set clock source */
+  .set_speed = lmc_dummy_set2_1,   /* set line speed */
+  .set_cable_length = lmc_dummy_set_1, /* set cable length */
+  .set_scrambler = lmc_dummy_set_1,/* set scrambler */
+  .get_link_status = lmc_hssi_get_link_status, /* get link status */
+  .set_link_status = lmc_hssi_set_link_status, /* set link status */
+  .set_crc_length = lmc_hssi_set_crc_length,   /* set CRC length */
+  .set_circuit_type = lmc_dummy_set_1, /* set T1 or E1 circuit type */
+  .watchdog = lmc_hssi_watchdog
 };
 
-lmc_media_t lmc_ssi_media = { lmc_ssi_init,/* special media init stuff */
-  lmc_ssi_default, /* reset to default state */
-  lmc_ssi_set_status,  /* reset status to state provided */
-  lmc_ssi_set_clock,   /* set clock source */
-  lmc_ssi_set_speed,   /* set line speed */
-  lmc_dummy_set_1, /* set cable length */
-  lmc_dummy_set_1, /* set scrambler */
-  lmc_ssi_get_link_status, /* get link status */
-  lmc_ssi_set_link_status, /* set link status */
-  lmc_ssi_set_crc_length,  /* set CRC length */
-  lmc_dummy_set_1, /* set T1 or E1 circuit type */
-  lmc_ssi_watchdog
+lmc_media_t lmc_ssi_media = {
+  .init = lmc_ssi_init,/* special media init 
stuff */
+  .defaults = lmc_ssi_default, /* reset to default state */
+  .set_status = lmc_ssi_set_status,/* reset status to state 
provided */
+  .set_clock_source = lmc_ssi_set_clock,   /* set clock source */
+  .set_speed = lmc_ssi_set_speed,  /* set line speed */
+  .set_cable_length = lmc_dummy

[PATCH] net/x25: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 net/x25/sysctl_net_x25.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/x25/sysctl_net_x25.c b/net/x25/sysctl_net_x25.c
index 43239527a205..a06dfe143c67 100644
--- a/net/x25/sysctl_net_x25.c
+++ b/net/x25/sysctl_net_x25.c
@@ -70,7 +70,7 @@ static struct ctl_table x25_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
-   { 0, },
+   { },
 };
 
 void __init x25_register_sysctl(void)
-- 
2.7.4


-- 
Kees Cook
Nexus Security


Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-22 Thread Kees Cook
On Tue, Mar 21, 2017 at 7:03 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Tue, 2017-03-21 at 16:51 -0700, Kees Cook wrote:
>
>> Am I understanding you correctly that you'd want something like:
>>
>> refcount.h:
>> #ifdef UNPROTECTED_REFCOUNT
>> #define refcount_inc(x)   atomic_inc(x)
>> ...
>> #else
>> void refcount_inc(...
>> ...
>> #endif
>>
>> some/net.c:
>> #define UNPROTECTED_REFCOUNT
>> #include 
>>
>> or similar?
>
> At first, it could be something simple like that yes.
>
> Note that we might define two refcount_inc()  : One that does whole
> tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> on non debug kernels.
>
> Then later, maybe provide a dynamic infrastructure so that we can
> dynamically force the full checks even for refcount_inc_relaxed() on say
> 1% of the hosts, to get better debug coverage ?

Well, this isn't about finding bugs in normal workflows. This is about
catching bugs that attackers have found and start exploiting to gain a
use-after-free primitive. The intention is for it to be always
enabled.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-16 Thread Kees Cook
On Thu, Mar 16, 2017 at 10:58 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2017-03-16 at 17:28 +0200, Elena Reshetova wrote:
>> refcount_t type and corresponding API should be
>> used instead of atomic_t when the variable is used as
>> a reference counter. This allows to avoid accidental
>> refcounter overflows that might lead to use-after-free
>> situations.
>
>
> ...
>
>>  static __always_inline void sock_hold(struct sock *sk)
>>  {
>> - atomic_inc(>sk_refcnt);
>> + refcount_inc(>sk_refcnt);
>>  }
>>
>
> While I certainly see the value of these refcount_t, we have a very
> different behavior on these atomic_inc() which were doing a single
> inlined LOCK RMW on x86.

I think we can certainly investigate arch-specific ways to improve the
performance, but the consensus seemed to be that getting the
infrastructure in and doing the migration was the first set of steps.

> We now call an external function performing a
> atomic_read(), various ops/tests, then atomic_cmpxchg_relaxed(), in a
> loop, loosing the nice ability for x86 of preventing live locks.
>
> Looks a lot of bloat, just to be able to chase hypothetical bugs in the
> kernel.
>
> I would love to have a way to enable extra debugging when I want a debug
> kernel, like LOCKDEP or KASAN.
>
> By adding all this bloat, we assert linux kernel is terminally buggy and
> every atomic_inc() we did was suspicious, and need to be always
> instrumented/validated.

This IS the assertion, unfortunately. With average 5 year lifetimes on
security flaws[1], and many of the last couple years' public exploits
being refcount flaws[2], this is something we have to get done. We
need the default kernel to be much more self-protective, and this is
one of many places to make it happen.

I am, of course, biased, but I think the evidence of actual
refcounting attacks outweighs the theoretical performance cost of
these changes. If there is a realistic workflow that shows actual
problems, let's examine it and find a solution for that as a separate
part of this work without blocking this migration.

-Kees

[1] https://outflux.net/blog/archives/2016/10/18/security-bug-lifetime/
[2] http://kernsec.org/wiki/index.php/Bug_Classes/Integer_overflow

-- 
Kees Cook
Pixel Security


Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-21 Thread Kees Cook
On Mon, Mar 20, 2017 at 6:40 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
>> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
>> >
>> > So what bench/setup do you want ran?
>>
>> You can start by counting how many cycles an atomic op takes
>> vs. how many cycles this new code takes.
>
> On what uarch?
>
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
>
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.

Yeah, this is exactly what I'd like to find as well. Just comparing
cycles between refcount implementations, while interesting, doesn't
show us real-world performance changes, which is what we need to
measure.

Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
elsewhere in this email thread) real-world meaningful enough?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-21 Thread Kees Cook
On Tue, Mar 21, 2017 at 2:23 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:
>
>> Yeah, this is exactly what I'd like to find as well. Just comparing
>> cycles between refcount implementations, while interesting, doesn't
>> show us real-world performance changes, which is what we need to
>> measure.
>>
>> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
>> elsewhere in this email thread) real-world meaningful enough?
>
> Not at all ;)
>
> This was targeting the specific change I had in mind for
> ip_idents_reserve(), which is not used by TCP flows.

Okay, I just wanted to check. I didn't think so, but it was the only
example in the thread.

> Unfortunately there is no good test simulating real-world workloads,
> which are mostly using TCP flows.

Sure, but there has to be _something_ that can be used to test to
measure the effects. Without a meaningful test, it's weird to reject a
change for performance reasons.

> Most synthetic tools you can find are not using epoll(), and very often
> hit bottlenecks in other layers.
>
>
> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.

So, FWIW, I originally tried to make this a CONFIG in the first couple
passes at getting a refcount defense. I would be fine with this, but I
was not able to convince Peter. :) However, things have evolved a lot
since then, so perhaps there are things do be done here.

> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

It wasn't the issue of coding time; just that it had been specifically
not wanted. :)

Am I understanding you correctly that you'd want something like:

refcount.h:
#ifdef UNPROTECTED_REFCOUNT
#define refcount_inc(x)   atomic_inc(x)
...
#else
void refcount_inc(...
...
#endif

some/net.c:
#define UNPROTECTED_REFCOUNT
#include 

or similar?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] ebpf: verify the output of the JIT

2017-04-04 Thread Kees Cook
On Tue, Apr 4, 2017 at 3:08 PM, Tycho Andersen <ty...@docker.com> wrote:
> The goal of this patch is to protect the JIT against an attacker with a
> write-in-memory primitive. The JIT allocates a buffer which will eventually
> be marked +x, so we need to make sure that what was written to this buffer
> is what was intended.
>
> We acheive this by building a hash of the instruction buffer as
> instructions are emittted and then comparing that to a hash at the end of
> the JIT compile after the buffer has been marked read-only.
>
> Signed-off-by: Tycho Andersen <ty...@docker.com>
> CC: Daniel Borkmann <dan...@iogearbox.net>
> CC: Alexei Starovoitov <a...@kernel.org>
> CC: Kees Cook <keesc...@chromium.org>
> CC: Mickaël Salaün <m...@digikod.net>

Cool! This closes the race condition on producing the JIT vs going
read-only. I wonder if it might be possible to make this a more
generic interface to the BPF which would be allocate the hash, provide
the update callback during emit, and then do the hash check itself at
the end of bpf_jit_binary_lock_ro()?

-Kees

> ---
>  arch/x86/Kconfig|  11 
>  arch/x86/net/bpf_jit_comp.c | 147 
> 
>  2 files changed, 147 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cc98d5a..7b2db2c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2789,6 +2789,17 @@ config X86_DMA_REMAP
>
>  source "net/Kconfig"
>
> +config EBPF_JIT_HASH_OUTPUT
> +   def_bool y
> +   depends on HAVE_EBPF_JIT
> +   depends on BPF_JIT
> +   select CRYPTO_SHA256
> +   ---help---
> + Enables a double check of the JIT's output after it is marked 
> read-only to
> + ensure that it matches what the JIT generated.
> +
> + Note, only applies when /proc/sys/net/core/bpf_jit_harden > 0.
> +
>  source "drivers/Kconfig"
>
>  source "drivers/firmware/Kconfig"
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 32322ce..be1271e 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -13,9 +13,15 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  int bpf_jit_enable __read_mostly;
>
> +#ifdef CONFIG_EBPF_JIT_HASH_OUTPUT
> +struct crypto_shash *tfm __read_mostly;
> +#endif
> +
>  /*
>   * assembly code in arch/x86/net/bpf_jit.S
>   */
> @@ -25,7 +31,8 @@ extern u8 sk_load_byte_positive_offset[];
>  extern u8 sk_load_word_negative_offset[], sk_load_half_negative_offset[];
>  extern u8 sk_load_byte_negative_offset[];
>
> -static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
> +static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len,
> +struct shash_desc *hash)
>  {
> if (len == 1)
> *ptr = bytes;
> @@ -35,11 +42,15 @@ static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
> *(u32 *)ptr = bytes;
> barrier();
> }
> +
> +   if (IS_ENABLED(CONFIG_EBPF_JIT_HASH_OUTPUT) && hash)
> +   crypto_shash_update(hash, (u8 *) , len);
> +
> return ptr + len;
>  }
>
>  #define EMIT(bytes, len) \
> -   do { prog = emit_code(prog, bytes, len); cnt += len; } while (0)
> +   do { prog = emit_code(prog, bytes, len, hash); cnt += len; } while (0)
>
>  #define EMIT1(b1)  EMIT(b1, 1)
>  #define EMIT2(b1, b2)  EMIT((b1) + ((b2) << 8), 2)
> @@ -206,7 +217,7 @@ struct jit_context {
>  /* emit x64 prologue code for BPF program and check it's size.
>   * bpf_tail_call helper will skip it while jumping into another program
>   */
> -static void emit_prologue(u8 **pprog)
> +static void emit_prologue(u8 **pprog, struct shash_desc *hash)
>  {
> u8 *prog = *pprog;
> int cnt = 0;
> @@ -264,7 +275,7 @@ static void emit_prologue(u8 **pprog)
>   *   goto *(prog->bpf_func + prologue_size);
>   * out:
>   */
> -static void emit_bpf_tail_call(u8 **pprog)
> +static void emit_bpf_tail_call(u8 **pprog, struct shash_desc *hash)
>  {
> u8 *prog = *pprog;
> int label1, label2, label3;
> @@ -328,7 +339,7 @@ static void emit_bpf_tail_call(u8 **pprog)
>  }
>
>
> -static void emit_load_skb_data_hlen(u8 **pprog)
> +static void emit_load_skb_data_hlen(u8 **pprog, struct shash_desc *hash)
>  {
> u8 *prog = *pprog;
> int cnt = 0;
> @@ -348,7 +359,8 @@ static void emit_load_skb_data_hlen(u8 **pprog)
>  }
>
>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
> - 

[PATCH] af_unix: Use designated initializers

2017-04-04 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, and the initializer fixes
were extracted from grsecurity. In this case, NULL initialize with { }
instead of undesignated NULLs.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 net/unix/af_unix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 928691c43408..6a7fe7660551 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -996,7 +996,7 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
unsigned int hash;
struct unix_address *addr;
struct hlist_head *list;
-   struct path path = { NULL, NULL };
+   struct path path = { };
 
err = -EINVAL;
if (sunaddr->sun_family != AF_UNIX)
-- 
2.7.4


-- 
Kees Cook
Pixel Security


Re: [PATCH net-next v6 09/11] seccomp: Enhance test_harness with an assert step mechanism

2017-04-19 Thread Kees Cook
On Wed, Apr 19, 2017 at 2:51 PM, Mickaël Salaün <m...@digikod.net> wrote:
>
> On 19/04/2017 02:02, Kees Cook wrote:
>> On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
>>> This is useful to return an information about the error without being
>>> able to write to TH_LOG_STREAM.
>>>
>>> Helpers from test_harness.h may be useful outside of the seccomp
>>> directory.
>>>
>>> Signed-off-by: Mickaël Salaün <m...@digikod.net>
>>> Cc: Andy Lutomirski <l...@amacapital.net>
>>> Cc: Arnaldo Carvalho de Melo <a...@kernel.org>
>>> Cc: Kees Cook <keesc...@chromium.org>
>>> Cc: Shuah Khan <sh...@kernel.org>
>>> Cc: Will Drewry <w...@chromium.org>
>>> ---
>>>  tools/testing/selftests/seccomp/test_harness.h | 8 +++-
>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/seccomp/test_harness.h 
>>> b/tools/testing/selftests/seccomp/test_harness.h
>>> index a786c69c7584..77e407663e06 100644
>>> --- a/tools/testing/selftests/seccomp/test_harness.h
>>> +++ b/tools/testing/selftests/seccomp/test_harness.h
>>> @@ -397,7 +397,7 @@ struct __test_metadata {
>>> const char *name;
>>> void (*fn)(struct __test_metadata *);
>>> int termsig;
>>> -   int passed;
>>> +   __s8 passed;
>>
>> Why the reduction here? int is signed too?
>
> Because the return code of a process is capped to 8 bits and I use a
> negative value to not mess with the current interpretation of 0 (error)
> and 1 (OK) for the "passed" variable.
>
>>
>>> int trigger; /* extra handler after the evaluation */
>>> struct __test_metadata *prev, *next;
>>>  };
>>> @@ -476,6 +476,12 @@ void __run_test(struct __test_metadata *t)
>>> "instead of by signal (code: %d)\n",
>>> t->name,
>>> WEXITSTATUS(status));
>>> +   } else if (t->passed < 0) {
>>> +   fprintf(TH_LOG_STREAM,
>>> +   "%s: Failed at step #%d\n",
>>> +   t->name,
>>> +   t->passed * -1);
>>> +   t->passed = 0;
>>> }
>>
>> Instead of creating an overloaded mechanism here, perhaps have an
>> option reporting mechanism that can be enabled. Like adding to
>> __test_metadata "bool no_stream; int test_number;" and adding
>> test_number++ to each ASSERT/EXCEPT call, and doing something like:
>>
>> if (t->no_stream) {
>>   fprintf(TH_LOG_STREAM,
>>   "%s: Failed at step #%d\n",
>>   t->name,
>>t->test_number);
>> }
>>
>> It'd be a cleaner approach, maybe?
>
> Good idea, we will then be able to use 255 steps!
>
> Do you want me to send this as a separate patch?
>
> Can we move test_harness.h outside of the seccomp directory to be
> available to other subsystems as well?

Yeah, I would do two patches, and send them out separately (to shuah
with lkml and me in cc at least), one to move test_hardness.h into
some include/ directory, and then to add the new logic for streamless
reporting.

Thanks!

-Kees


-- 
Kees Cook
Pixel Security


Re: [PATCH net-next v6 09/11] seccomp: Enhance test_harness with an assert step mechanism

2017-04-19 Thread Kees Cook
On Wed, Apr 19, 2017 at 3:05 PM, Mickaël Salaün <m...@digikod.net> wrote:
>
>
> On 20/04/2017 00:02, Kees Cook wrote:
>> On Wed, Apr 19, 2017 at 2:51 PM, Mickaël Salaün <m...@digikod.net> wrote:
>>>
>>> On 19/04/2017 02:02, Kees Cook wrote:
>>>> On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
>>>>> This is useful to return an information about the error without being
>>>>> able to write to TH_LOG_STREAM.
>>>>>
>>>>> Helpers from test_harness.h may be useful outside of the seccomp
>>>>> directory.
>>>>>
>>>>> Signed-off-by: Mickaël Salaün <m...@digikod.net>
>>>>> Cc: Andy Lutomirski <l...@amacapital.net>
>>>>> Cc: Arnaldo Carvalho de Melo <a...@kernel.org>
>>>>> Cc: Kees Cook <keesc...@chromium.org>
>>>>> Cc: Shuah Khan <sh...@kernel.org>
>>>>> Cc: Will Drewry <w...@chromium.org>
>>>>> ---
>>>>>  tools/testing/selftests/seccomp/test_harness.h | 8 +++-
>>>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/tools/testing/selftests/seccomp/test_harness.h 
>>>>> b/tools/testing/selftests/seccomp/test_harness.h
>>>>> index a786c69c7584..77e407663e06 100644
>>>>> --- a/tools/testing/selftests/seccomp/test_harness.h
>>>>> +++ b/tools/testing/selftests/seccomp/test_harness.h
>>>>> @@ -397,7 +397,7 @@ struct __test_metadata {
>>>>> const char *name;
>>>>> void (*fn)(struct __test_metadata *);
>>>>> int termsig;
>>>>> -   int passed;
>>>>> +   __s8 passed;
>>>>
>>>> Why the reduction here? int is signed too?
>>>
>>> Because the return code of a process is capped to 8 bits and I use a
>>> negative value to not mess with the current interpretation of 0 (error)
>>> and 1 (OK) for the "passed" variable.
>>>
>>>>
>>>>> int trigger; /* extra handler after the evaluation */
>>>>> struct __test_metadata *prev, *next;
>>>>>  };
>>>>> @@ -476,6 +476,12 @@ void __run_test(struct __test_metadata *t)
>>>>> "instead of by signal (code: 
>>>>> %d)\n",
>>>>> t->name,
>>>>> WEXITSTATUS(status));
>>>>> +   } else if (t->passed < 0) {
>>>>> +   fprintf(TH_LOG_STREAM,
>>>>> +   "%s: Failed at step #%d\n",
>>>>> +   t->name,
>>>>> +   t->passed * -1);
>>>>> +   t->passed = 0;
>>>>> }
>>>>
>>>> Instead of creating an overloaded mechanism here, perhaps have an
>>>> option reporting mechanism that can be enabled. Like adding to
>>>> __test_metadata "bool no_stream; int test_number;" and adding
>>>> test_number++ to each ASSERT/EXCEPT call, and doing something like:
>>>>
>>>> if (t->no_stream) {
>>>>   fprintf(TH_LOG_STREAM,
>>>>   "%s: Failed at step #%d\n",
>>>>   t->name,
>>>>t->test_number);
>>>> }
>>>>
>>>> It'd be a cleaner approach, maybe?
>>>
>>> Good idea, we will then be able to use 255 steps!
>>>
>>> Do you want me to send this as a separate patch?
>>>
>>> Can we move test_harness.h outside of the seccomp directory to be
>>> available to other subsystems as well?
>>
>> Yeah, I would do two patches, and send them out separately (to shuah
>> with lkml and me in cc at least), one to move test_hardness.h into
>> some include/ directory, and then to add the new logic for streamless
>> reporting.
>>
>> Thanks!
>>
>
> Good, in which place and name would it fit better?

I've added Shuah to CC. Shuah, where should a common header file for
selftests live? Should a new "include" directory be added?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH net-next v6 05/11] seccomp: Split put_seccomp_filter() with put_seccomp()

2017-04-19 Thread Kees Cook
On Wed, Apr 19, 2017 at 3:18 PM, Mickaël Salaün <m...@digikod.net> wrote:
>
> On 19/04/2017 00:47, Mickaël Salaün wrote:
>>
>> On 19/04/2017 00:23, Kees Cook wrote:
>>> On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
>>>> The semantic is unchanged. This will be useful for the Landlock
>>>> integration with seccomp (next commit).
>>>>
>>>> Signed-off-by: Mickaël Salaün <m...@digikod.net>
>>>> Cc: Kees Cook <keesc...@chromium.org>
>>>> Cc: Andy Lutomirski <l...@amacapital.net>
>>>> Cc: Will Drewry <w...@chromium.org>
>>>> ---
>>>>  include/linux/seccomp.h |  4 ++--
>>>>  kernel/fork.c   |  2 +-
>>>>  kernel/seccomp.c| 18 +-
>>>>  3 files changed, 16 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>>>> index ecc296c137cd..e25aee2cdfc0 100644
>>>> --- a/include/linux/seccomp.h
>>>> +++ b/include/linux/seccomp.h
>>>> @@ -77,10 +77,10 @@ static inline int seccomp_mode(struct seccomp *s)
>>>>  #endif /* CONFIG_SECCOMP */
>>>>
>>>>  #ifdef CONFIG_SECCOMP_FILTER
>>>> -extern void put_seccomp_filter(struct task_struct *tsk);
>>>> +extern void put_seccomp(struct task_struct *tsk);
>>>>  extern void get_seccomp_filter(struct task_struct *tsk);
>>>>  #else  /* CONFIG_SECCOMP_FILTER */
>>>> -static inline void put_seccomp_filter(struct task_struct *tsk)
>>>> +static inline void put_seccomp(struct task_struct *tsk)
>>>>  {
>>>> return;
>>>>  }
>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>> index 6c463c80e93d..a27d8e67ce33 100644
>>>> --- a/kernel/fork.c
>>>> +++ b/kernel/fork.c
>>>> @@ -363,7 +363,7 @@ void free_task(struct task_struct *tsk)
>>>>  #endif
>>>> rt_mutex_debug_task_free(tsk);
>>>> ftrace_graph_exit_task(tsk);
>>>> -   put_seccomp_filter(tsk);
>>>> +   put_seccomp(tsk);
>>>> arch_release_task_struct(tsk);
>>>> if (tsk->flags & PF_KTHREAD)
>>>> free_kthread_struct(tsk);
>>>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>>>> index 65f61077ad50..326f79e32127 100644
>>>> --- a/kernel/seccomp.c
>>>> +++ b/kernel/seccomp.c
>>>> @@ -64,6 +64,8 @@ struct seccomp_filter {
>>>>  /* Limit any path through the tree to 256KB worth of instructions. */
>>>>  #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
>>>>
>>>> +static void put_seccomp_filter(struct seccomp_filter *filter);
>>>
>>> Can this be reorganized easily to avoid a forward-declaration?
>>
>> I didn't want to move too much code but I will.
>>
>>>
>>>> +
>>>>  /*
>>>>   * Endianness is explicitly ignored and left for BPF program authors to 
>>>> manage
>>>>   * as per the specific architecture.
>>>> @@ -314,7 +316,7 @@ static inline void seccomp_sync_threads(void)
>>>>  * current's path will hold a reference.  (This also
>>>>  * allows a put before the assignment.)
>>>>  */
>>>> -   put_seccomp_filter(thread);
>>>> +   put_seccomp_filter(thread->seccomp.filter);
>>>> smp_store_release(>seccomp.filter,
>>>>   caller->seccomp.filter);
>>>>
>>>> @@ -476,10 +478,11 @@ static inline void seccomp_filter_free(struct 
>>>> seccomp_filter *filter)
>>>> }
>>>>  }
>>>>
>>>> -/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
>>>> -void put_seccomp_filter(struct task_struct *tsk)
>>>> +/* put_seccomp_filter - decrements the ref count of a filter */
>>>> +static void put_seccomp_filter(struct seccomp_filter *filter)
>>>>  {
>>>> -   struct seccomp_filter *orig = tsk->seccomp.filter;
>>>> +   struct seccomp_filter *orig = filter;
>>>> +
>>>> /* Clean up single-reference branches iteratively. */
>>>> while (orig && atomic_dec_and_test(>usage)) {
>>>> struct seccomp_filter *freeme = orig;
>>>> @@ -488,6 +491,11 @@ void put_seccomp_filter(struct task_struct *tsk)
>>>> }
>>>>  }
>>>>
>>>> +void put_seccomp(struct task_struct *tsk)
>>>> +{
>>>> +   put_seccomp_filter(tsk->seccomp.filter);
>>>> +}
>>>> +
>>>>  static void seccomp_init_siginfo(siginfo_t *info, int syscall, int reason)
>>>>  {
>>>> memset(info, 0, sizeof(*info));
>>>> @@ -914,7 +922,7 @@ long seccomp_get_filter(struct task_struct *task, 
>>>> unsigned long filter_off,
>>>> if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog)))
>>>> ret = -EFAULT;
>>>>
>>>> -   put_seccomp_filter(task);
>>>> +   put_seccomp_filter(task->seccomp.filter);
>>>> return ret;
>>>
>>> I don't like that the arguments to get_seccomp_filter() and
>>> put_seccomp_filter() are now different. I think they should match for
>>> readability.
>>
>> OK, I can do that.
>>
>
> Kees, can I send this as a separate patch?

Sure! Though I still think the argument to get/put_seccomp_filter()
should be task_struct.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH net-next v6 04/11] landlock: Add LSM hooks related to filesystem

2017-04-19 Thread Kees Cook
On Wed, Apr 19, 2017 at 3:03 PM, Mickaël Salaün <m...@digikod.net> wrote:
>
> On 19/04/2017 01:40, Kees Cook wrote:
>> On Tue, Apr 18, 2017 at 4:16 PM, Casey Schaufler <ca...@schaufler-ca.com> 
>> wrote:
>>> On 4/18/2017 3:44 PM, Mickaël Salaün wrote:
>>>> On 19/04/2017 00:17, Kees Cook wrote:
>>>>> On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
>>>>>> +void __init landlock_add_hooks(void)
>>>>>> +{
>>>>>> +   pr_info("landlock: Version %u", LANDLOCK_VERSION);
>>>>>> +   landlock_add_hooks_fs();
>>>>>> +   security_add_hooks(NULL, 0, "landlock");
>>>>>> +   bpf_register_prog_type(_landlock_type);
>>>>> I'm confused by the separation of hook registration here. The call to
>>>>> security_add_hooks is with count=0 is especially weird. Why isn't this
>>>>> just a single call with security_add_hooks(landlock_hooks,
>>>>> ARRAY_SIZE(landlock_hooks), "landlock")?
>>>> Yes, this is ugly with the new security_add_hooks() with three arguments
>>>> but I wanted to split the hooks definition in multiple files.
>>>
>>> Why? I'll buy a good argument, but there are dangers in
>>> allowing multiple calls to security_add_hooks().
>
> I prefer to have one file per hook "family" (e.g. filesystem, network,
> ptrace…). This reduce the mess with all the included files (needed for
> LSM hook argument types) and make the files easier to read, understand
> and maintain.
>
>>>
>>>>
>>>> The current security_add_hooks() use lsm_append(lsm, _names) which
>>>> is not exported. Unfortunately, calling multiple security_add_hooks()
>>>> with the same LSM name would register multiple names for the same LSM…
>>>> Is it OK if I modify this function to not add duplicated entries?
>>>
>>> It may seem absurd, but it's conceivable that a module might
>>> have two hooks it wants called. My example is a module that
>>> counts the number of times SELinux denies a process access to
>>> things (which needs to be called before and after SELinux in
>>> order to detect denials) and takes "appropriate action" if
>>> too many denials occur. It would be weird, wonky and hackish,
>>> but that never stopped anybody before.
>
> Right, but now, with the new lsm_append(), module names are concatenated
> ("%s,%s") in the lsm_names variable. It would be nice to not pollute
> this string with multiple time the same module name.

Perhaps security_add_hooks could be modified to accept a NULL lsm to
skip the lsm_append() call, so it could do:

security_add_hooks(hooks1, count1, NULL);
security_add_hooks(hooks2, count2, NULL);
security_add_hooks(NULL, 0, "landlock");

Or, as Casey suggests, disregard adding the name when it already exists:

security_add_hooks(hooks1, count1, "landlock");
security_add_hooks(hooks2, count2, "landlock");

Yeah, I think I prefer this...

-Kees

>
>>
>> If ends up being sane and clear, I'm fine with allowing multiple calls.
>>
>> -Kees
>>
>



-- 
Kees Cook
Pixel Security


Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Kees Cook
On Wed, Mar 8, 2017 at 3:55 PM, Laura Abbott <labb...@redhat.com> wrote:
> On 03/08/2017 02:36 PM, Kees Cook wrote:
>> On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann <dan...@iogearbox.net> wrote:
>>> [   28.474232] rodata_test: test data was not read only
>>> [...]
>>
>> In my tests so far, I've never been able to get rodata_test to fail
>> (Qemu 2.5.0, Ubuntu). I'll retry with your .config and see if I can
>> recheck under Qemu 2.7.1. Do you see these failures on real hardware?
>>
>> -Kees
>>
>
> FWIW, I'm seeing the same issue with qemu 2.6.2 and 2.8.0 on Fedora 24
> and rawhide respectively.
>
> I also notice that CONFIG_X86_PAE is turned off in the defconfig. If
> I set CONFIG_HIGHMEM_64G which turns on CONFIG_X86_PAE the problem
> goes away. I can't tell if this is an indication of magically hiding
> the TLB problem or if there is an issue with !X86_PAE invalidation.

I found my difference. I normally run qemu with "-cpu host" which
makes the failure go away. With "-cpu kvm64", I see the rodata_test
failure immediately. Seems like this may be a kvm cpu feature
emulation bug? I'll see if I can find the specific cpu feature in the
morning...

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH net] bpf: disable broken write protection on i386

2017-03-08 Thread Kees Cook
On Mon, Mar 6, 2017 at 10:11 AM, Kees Cook <keesc...@chromium.org> wrote:
> On Fri, Mar 3, 2017 at 7:23 PM, Daniel Borkmann <dan...@iogearbox.net> wrote:
>> Latter shows that memory protecting the kernel seems not working either
>> on i386 (!). Test suite output:
>>
>>   [...]
>>   [   12.692836] Write protecting the kernel text: 13416k
>>   [   12.693309] Write protecting the kernel read-only data: 5292k
>>   [   12.693802] rodata_test: test data was not read only
>>   [...]
>>
>> Work-around to not enable ARCH_HAS_SET_MEMORY for i386 is not optimal
>> as it doesn't fix the issue in presumably broken set_memory_*(), but
>> it at least avoids people avoid having to deal with random corruptions
>> that are hard to track down for the time being until a real fix can
>> be found.
>
> Wow. Uhm, so, something must be _really_ broken. i386 should have no
> problem with using the set_memory_*() functions. The fact that
> DEBUG_RODATA_TEST failed is also pretty crazy, but may be unrelated
> (that test was just refactored too).

I'm not able to reproduce this. I built Linus's tree and rodata_test
passes for me on i386. I tried the .config from Fengguang (with
RODATA_TEST=y added), but it still passes for me:
https://lkml.org/lkml/2017/3/1/344

I wonder if something change changed already in the tree? Can you
still reproduce this?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86-32: fix tlb flushing when lguest clears PGE

2017-03-12 Thread Kees Cook
Are there nominations for most comprehensive changelog of the year? :)
This is awesome.

-Kees

On Fri, Mar 10, 2017 at 6:31 PM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> Fengguang reported [1] random corruptions from various locations on
> x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY
> config") and 9d876e79df6a ("bpf: fix unlocking of jited image when
> module ronx not set") that uses the former. While x86-32 doesn't
> have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro()
> got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test
> kernel doesn't have module support built in and therefore never
> had the DEBUG_SET_MODULE_RONX setting enabled.
>
> After investigating the crashes further, it turned out that using
> set_memory_ro() and set_memory_rw() didn't have the desired effect,
> for example, setting the pages as read-only on x86-32 would still
> let probe_kernel_write() succeed without error. This behavior would
> manifest itself in situations where the vmalloc'ed buffer was accessed
> prior to set_memory_*() such as in case of bpf_prog_alloc(). In
> cases where it wasn't, the page attribute changes seemed to have
> taken effect, leading to the conclusion that a TLB invalidate
> didn't happen. Moreover, it turned out that this issue reproduced
> with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the
> issue occurs, change_page_attr_set_clr() did trigger a TLB flush
> as expected via __flush_tlb_all() through cpa_flush_range(), though.
>
> There are 3 variants for issuing a TLB flush: invpcid_flush_all()
> (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE),
> cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush.
> For "-cpu host" case in my setup, the flush used invpcid_flush_all()
> variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching
> the kvm64 case to cr3 manually worked fine, and further investigating
> the cr4 one turned out that X86_CR4_PGE bit was not set in cr4
> register, meaning the __native_flush_tlb_global_irq_disabled() wrote
> cr4 twice with the same value instead of clearing X86_CR4_PGE in the
> first write to trigger the flush.
>
> It turned out that X86_CR4_PGE was cleared from cr4 during init
> from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE
> bit is also cleared from there due to concerns of using PGE in
> guest kernel that can lead to hard to trace bugs (see bff672e630a0
> ("lguest: documentation V: Host") in init()). The CPU feature bits
> are cleared in dynamic boot_cpu_data, but they never propagated to
> __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has()
> for testing which variant of TLB flushing to use, meaning they still
> used the old setting of the host kernel.
>
> Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would
> propagate to static_cpu_has() checks is too late at this point as
> sections have been patched already, so for now, it seems reasonable
> to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to
> commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This
> lets the TLB flush trigger via cr3 as originally intended, properly
> makes the new page attributes visible and thus fixes the crashes
> seen by Fengguang.
>
>   [1] https://lkml.org/lkml/2017/3/1/344
>
> Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge")
> Reported-by: Fengguang Wu <fengguang...@intel.com>
> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> Cc: Borislav Petkov <b...@suse.de>
> Cc: Linus Torvalds <torva...@linux-foundation.org>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Laura Abbott <labb...@redhat.com>
> Cc: Ingo Molnar <mi...@kernel.org>
> Cc: H. Peter Anvin <h...@zytor.com>
> Cc: Rusty Russell <ru...@rustcorp.com.au>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: David S. Miller <da...@davemloft.net>
> ---
>  arch/x86/include/asm/tlbflush.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 6fa8594..fc5abff 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -188,7 +188,7 @@ static inline void __native_flush_tlb_single(unsigned 
> long addr)
>
>  static inline void __flush_tlb_all(void)
>  {
> -   if (static_cpu_has(X86_FEATURE_PGE))
> +   if (boot_cpu_has(X86_FEATURE_PGE))
> __flush_tlb_global();
> else
> __flush_tlb();
> --
> 1.9.3
>



-- 
Kees Cook
Pixel Security


Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Kees Cook
On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> [   28.474232] rodata_test: test data was not read only
> [...]

In my tests so far, I've never been able to get rodata_test to fail
(Qemu 2.5.0, Ubuntu). I'll retry with your .config and see if I can
recheck under Qemu 2.7.1. Do you see these failures on real hardware?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH net] bpf: disable broken write protection on i386

2017-03-06 Thread Kees Cook
as just refactored too).

Is it possible that it's just the enabling of set_memory_*() for the
non-modular case? The ARCH_HAS_SET_MEMORY commit is just a convenience
config; i386 has had those functions for a while now, and they're the
same between x86_64 and i386. O_o Perhaps they aren't safe on i386 for
non-modular addresses?

I do a few X86_32 and 64 differences in arch/x86/mm/pageattr.c,
though. I wonder about __set_pmd_pte(), but I haven't looked closely
at x86 paging code before...

>
>   [1] https://lkml.org/lkml/2017/3/2/648
>
> Reported-by: Fengguang Wu <fengguang...@intel.com>
> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> Cc: Laura Abbott <labb...@redhat.com>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Alexei Starovoitov <a...@kernel.org>
> ---
>  [ Sending to -net as bpf related, but I don't mind to route it
>elsewhere, too. ]
>
>  arch/x86/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cc98d5a..626dc6a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -54,7 +54,7 @@ config X86
> select ARCH_HAS_KCOVif X86_64
> select ARCH_HAS_MMIO_FLUSH
> select ARCH_HAS_PMEM_API    if X86_64
> -   select ARCH_HAS_SET_MEMORY
> +   select ARCH_HAS_SET_MEMORY  if X86_64
> select ARCH_HAS_SG_CHAIN
> select ARCH_HAS_STRICT_KERNEL_RWX
> select ARCH_HAS_STRICT_MODULE_RWX
> --
> 1.9.3
>

I'm okay with this patch since only BPF pays attention to that CONFIG,
but we need to fix the problem. :)

-Kees

-- 
Kees Cook
Pixel Security


[PATCH] qlge: avoid format string exposure in workqueue

2017-04-05 Thread Kees Cook
While unlikely, this makes sure the workqueue name won't be processed
as a format string.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/net/ethernet/qlogic/qlge/qlge_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c 
b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index e9e647072596..1188d420fe53 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -4686,7 +4686,8 @@ static int ql_init_device(struct pci_dev *pdev, struct 
net_device *ndev,
/*
 * Set up the operating parameters.
 */
-   qdev->workqueue = alloc_ordered_workqueue(ndev->name, WQ_MEM_RECLAIM);
+   qdev->workqueue = alloc_ordered_workqueue("%s", WQ_MEM_RECLAIM,
+ ndev->name);
INIT_DELAYED_WORK(>asic_reset_work, ql_asic_reset_work);
INIT_DELAYED_WORK(>mpi_reset_work, ql_mpi_reset_work);
INIT_DELAYED_WORK(>mpi_work, ql_mpi_work);
-- 
2.7.4


-- 
Kees Cook
Pixel Security


[PATCH] net: ethernet: wiznet: avoid format string exposure

2017-04-05 Thread Kees Cook
While unlikely, this makes sure any format strings in the device name
can't exposure information via the resulting workqueue name.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/net/ethernet/wiznet/w5100.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/wiznet/w5100.c 
b/drivers/net/ethernet/wiznet/w5100.c
index f90267f0519f..2bdfb39215e9 100644
--- a/drivers/net/ethernet/wiznet/w5100.c
+++ b/drivers/net/ethernet/wiznet/w5100.c
@@ -1152,7 +1152,8 @@ int w5100_probe(struct device *dev, const struct 
w5100_ops *ops,
if (err < 0)
goto err_register;
 
-   priv->xfer_wq = alloc_workqueue(netdev_name(ndev), WQ_MEM_RECLAIM, 0);
+   priv->xfer_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0,
+   netdev_name(ndev));
if (!priv->xfer_wq) {
err = -ENOMEM;
goto err_wq;
-- 
2.7.4


-- 
Kees Cook
Pixel Security


Re: [kernel-hardening] [PATCH net-next v6 06/11] seccomp,landlock: Handle Landlock events per process hierarchy

2017-04-18 Thread Kees Cook
On Fri, Mar 31, 2017 at 2:15 PM, Mickaël Salaün <m...@digikod.net> wrote:
>
>
> On 29/03/2017 12:35, Djalal Harouni wrote:
>> On Wed, Mar 29, 2017 at 1:46 AM, Mickaël Salaün <m...@digikod.net> wrote:
>
>>> @@ -25,6 +30,9 @@ struct seccomp_filter;
>>>  struct seccomp {
>>> int mode;
>>> struct seccomp_filter *filter;
>>> +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
>>> +   struct landlock_events *landlock_events;
>>> +#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
>>>  };
>>
>> Sorry if this was discussed before, but since this is mean to be a
>> stackable LSM, I'm wondering if later you could move the events from
>> seccomp, and go with a security_task_alloc() model [1] ?
>>
>> Thanks!
>>
>> [1] 
>> http://kernsec.org/pipermail/linux-security-module-archive/2017-March/000184.html
>>
>
> Landlock use the seccomp syscall to attach a rule to a process and using
> struct seccomp to store this rule make sense. There is currently no way
> to store multiple task->security, which is needed for a stackable LSM
> like Landlock, but we could move the events there if needed in the future.

It does stand out to me that the only thing landlock is using seccomp
for is its syscall... :P

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH net-next v6 01/11] bpf: Add eBPF program subtype and is_valid_subtype() verifier

2017-04-18 Thread Kees Cook
On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
> The goal of the program subtype is to be able to have different static
> fine-grained verifications for a unique program type.
>
> The struct bpf_verifier_ops gets a new optional function:
> is_valid_subtype(). This new verifier is called at the beginning of the
> eBPF program verification to check if the (optional) program subtype is
> valid.
>
> For now, only Landlock eBPF programs are using a program subtype (see
> next commit) but this could be used by other program types in the future.
>
> Changes since v5:
> * use a prog_subtype pointer and make it future-proof
> * add subtype test
> * constify bpf_load_program()'s subtype argument
> * cleanup subtype initialization
> * rebase
>
> Changes since v4:
> * replace the "status" field with "version" (more generic)
> * replace the "access" field with "ability" (less confusing)
>
> Changes since v3:
> * remove the "origin" field
> * add an "option" field
> * cleanup comments
>
> Signed-off-by: Mickaël Salaün <m...@digikod.net>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: Arnaldo Carvalho de Melo <a...@kernel.org>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: David S. Miller <da...@davemloft.net>
> Link: https://lkml.kernel.org/r/20160827205559.ga43...@ast-mbp.thefacebook.com
> ---
> [...]
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c35ebfe6d84d..3d07b10ade5e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -843,6 +879,26 @@ static int bpf_prog_load(union bpf_attr *attr)
> if (err < 0)
> goto free_prog;
>
> +   /* copy eBPF program subtype from user space */
> +   if (attr->prog_subtype) {
> +   __u32 size;
> +
> +   size = check_user_buf((void __user *)attr->prog_subtype,
> + attr->prog_subtype_size,
> + sizeof(prog->subtype));
> +   if (size < 0) {
> +   err = size;
> +   goto free_prog;
> +   }
> +   /* prog->subtype is __GFP_ZERO */
> +   if (copy_from_user(>subtype,
> +      u64_to_user_ptr(attr->prog_subtype), size)
> +  != 0)

It might be worth adding a comment here about how the ToCToU of the
check-then-copy doesn't matter in this case, since it's just a
future-proofing of bits, etc.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH net-next v6 02/11] bpf,landlock: Define an eBPF program type for Landlock

2017-04-18 Thread Kees Cook
On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
> Add a new type of eBPF program used by Landlock rules.
>
> This new BPF program type will be registered with the Landlock LSM
> initialization.
>
> Add an initial Landlock Kconfig.
>
> Changes since v5:
> * rename file hooks.c to init.c
> * fix spelling
>
> Changes since v4:
> * merge a minimal (not enabled) LSM code and Kconfig in this commit
>
> Changes since v3:
> * split commit
> * revamp the landlock_context:
>   * add arch, syscall_nr and syscall_cmd (ioctl, fcntl…) to be able to
> cross-check action with the event type
>   * replace args array with dedicated fields to ease the addition of new
> fields
>
> Signed-off-by: Mickaël Salaün <m...@digikod.net>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: Andy Lutomirski <l...@amacapital.net>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: David S. Miller <da...@davemloft.net>
> Cc: James Morris <james.l.mor...@oracle.com>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Serge E. Hallyn <se...@hallyn.com>
> ---
> [...]
> +static inline bool bpf_landlock_is_valid_subtype(
> +   union bpf_prog_subtype *prog_subtype)
> +{
> +   if (WARN_ON(!prog_subtype))
> +   return false;
> +
> +   switch (prog_subtype->landlock_rule.event) {
> +   case LANDLOCK_SUBTYPE_EVENT_FS:
> +   break;
> +   case LANDLOCK_SUBTYPE_EVENT_UNSPEC:
> +   default:
> +   return false;
> +   }
> +
> +   if (!prog_subtype->landlock_rule.version ||
> +   prog_subtype->landlock_rule.version > 
> LANDLOCK_VERSION)
> +   return false;
> +   if (!prog_subtype->landlock_rule.event ||
> +   prog_subtype->landlock_rule.event > 
> _LANDLOCK_SUBTYPE_EVENT_LAST)
> +   return false;
> +   if (prog_subtype->landlock_rule.ability & 
> ~_LANDLOCK_SUBTYPE_ABILITY_MASK)
> +   return false;
> +   if (prog_subtype->landlock_rule.option & 
> ~_LANDLOCK_SUBTYPE_OPTION_MASK)
> +   return false;
> +
> +   /* check ability flags */
> +   if (prog_subtype->landlock_rule.ability & 
> LANDLOCK_SUBTYPE_ABILITY_WRITE &&
> +   !capable(CAP_SYS_ADMIN))
> +   return false;
> +   if (prog_subtype->landlock_rule.ability & 
> LANDLOCK_SUBTYPE_ABILITY_DEBUG &&
> +   !capable(CAP_SYS_ADMIN))
> +   return false;
> +
> +   return true;
> +}

I would add more comments for the rule and ability tests just to help
people read this.

> +
> +static inline const struct bpf_func_proto *bpf_landlock_func_proto(
> +   enum bpf_func_id func_id, union bpf_prog_subtype 
> *prog_subtype)
> +{
> +   bool event_fs = (prog_subtype->landlock_rule.event ==
> +   LANDLOCK_SUBTYPE_EVENT_FS);
> +   bool ability_write = !!(prog_subtype->landlock_rule.ability &
> +   LANDLOCK_SUBTYPE_ABILITY_WRITE);
> +   bool ability_debug = !!(prog_subtype->landlock_rule.ability &
> +   LANDLOCK_SUBTYPE_ABILITY_DEBUG);
> +
> +   switch (func_id) {
> +   case BPF_FUNC_map_lookup_elem:
> +   return _map_lookup_elem_proto;
> +
> +   /* ability_write */
> +   case BPF_FUNC_map_delete_elem:
> +   if (ability_write)
> +   return _map_delete_elem_proto;
> +   return NULL;
> +   case BPF_FUNC_map_update_elem:
> +   if (ability_write)
> +   return _map_update_elem_proto;
> +   return NULL;
> +
> +   /* ability_debug */
> +   case BPF_FUNC_get_current_comm:
> +   if (ability_debug)
> +   return _get_current_comm_proto;
> +   return NULL;
> +   case BPF_FUNC_get_current_pid_tgid:
> +   if (ability_debug)
> +   return _get_current_pid_tgid_proto;
> +   return NULL;
> +   case BPF_FUNC_get_current_uid_gid:
> +   if (ability_debug)
> +   return _get_current_uid_gid_proto;
> +   return NULL;
> +   case BPF_FUNC_trace_printk:
> +   if (ability_debug)
> +   return bpf_get_trace_printk_proto();
> +   return NULL;
> +
> +   default:
> +   return NULL;
> +   }
> +}

I find this switch statement mixed with the "if (abi

Re: [PATCH net-next v6 04/11] landlock: Add LSM hooks related to filesystem

2017-04-18 Thread Kees Cook
On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
> Handle 33 filesystem-related LSM hooks for the Landlock filesystem
> event: LANDLOCK_SUBTYPE_EVENT_FS.
>
> A Landlock event wrap LSM hooks for similar kernel object types (e.g.
> struct file, struct path...). Multiple LSM hooks can trigger the same
> Landlock event.
>
> Landlock handle nine coarse-grained actions: read, write, execute, new,
> get, remove, ioctl, lock and fcntl. Each of them abstract LSM hook
> access control in a way that can be extended in the future.
>
> The Landlock LSM hook registration is done after other LSM to only run
> actions from user-space, via eBPF programs, if the access was granted by
> major (privileged) LSMs.
>
> Changes since v5:
> * split hooks.[ch] into hooks.[ch] and hooks_fs.[ch]
> * add more documentation
> * cosmetic fixes
>
> Changes since v4:
> * add LSM hook abstraction called Landlock event
>   * use the compiler type checking to verify hooks use by an event
>   * handle all filesystem related LSM hooks (e.g. file_permission,
> mmap_file, sb_mount...)
> * register BPF programs for Landlock just after LSM hooks registration
> * move hooks registration after other LSMs
> * add failsafes to check if a hook is not used by the kernel
> * allow partial raw value access form the context (needed for programs
>   generated by LLVM)
>
> Changes since v3:
> * split commit
> * add hooks dealing with struct inode and struct path pointers:
>   inode_permission and inode_getattr
> * add abstraction over eBPF helper arguments thanks to wrapping structs
>
> Signed-off-by: Mickaël Salaün <m...@digikod.net>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: Andy Lutomirski <l...@amacapital.net>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: David S. Miller <da...@davemloft.net>
> Cc: James Morris <james.l.mor...@oracle.com>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Serge E. Hallyn <se...@hallyn.com>
> ---
>  include/linux/lsm_hooks.h|   5 +
>  security/landlock/Makefile   |   4 +-
>  security/landlock/hooks.c| 115 +
>  security/landlock/hooks.h| 177 ++
>  security/landlock/hooks_fs.c | 563 
> +++
>  security/landlock/hooks_fs.h |  19 ++
>  security/landlock/init.c |  13 +
>  security/security.c  |   7 +-
>  8 files changed, 901 insertions(+), 2 deletions(-)
>  create mode 100644 security/landlock/hooks.c
>  create mode 100644 security/landlock/hooks.h
>  create mode 100644 security/landlock/hooks_fs.c
>  create mode 100644 security/landlock/hooks_fs.h
>
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index e29d4c62a3c8..884289166a0e 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1920,5 +1920,10 @@ void __init loadpin_add_hooks(void);
>  #else
>  static inline void loadpin_add_hooks(void) { };
>  #endif
> +#ifdef CONFIG_SECURITY_LANDLOCK
> +extern void __init landlock_add_hooks(void);
> +#else
> +static inline void __init landlock_add_hooks(void) { }
> +#endif /* CONFIG_SECURITY_LANDLOCK */
>
>  #endif /* ! __LINUX_LSM_HOOKS_H */
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index 7205f9a7a2ee..c0db504a6335 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -1,3 +1,5 @@
> +ccflags-$(CONFIG_SECURITY_LANDLOCK) += -Werror=unused-function

Why is this needed? If it can't be avoided, a comment should exist
here explaining why.

> [...]
> @@ -127,3 +132,11 @@ static struct bpf_prog_type_list bpf_landlock_type 
> __ro_after_init = {
> .ops = _landlock_ops,
> .type = BPF_PROG_TYPE_LANDLOCK,
>  };
> +
> +void __init landlock_add_hooks(void)
> +{
> +   pr_info("landlock: Version %u", LANDLOCK_VERSION);
> +   landlock_add_hooks_fs();
> +   security_add_hooks(NULL, 0, "landlock");
> +   bpf_register_prog_type(_landlock_type);

I'm confused by the separation of hook registration here. The call to
security_add_hooks is with count=0 is especially weird. Why isn't this
just a single call with security_add_hooks(landlock_hooks,
ARRAY_SIZE(landlock_hooks), "landlock")?

> +}
> diff --git a/security/security.c b/security/security.c
> index d0e07f269b2d..a3e9f4625991 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -64,10 +64,15 @@ int __init security_init(void)
> loadpin_add_hooks();
>
> /*
> -* Load all the remaining security modules.
> +* Load all remaining privileged security modules.
>  */
> do_security_initcalls();
>
> +   /*
> +* Load potentially-unprivileged security modules at the end.
> +*/
> +   landlock_add_hooks();

Oh, is this to make it last in the list? Is there a reason it has to be last?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH net-next v6 08/11] bpf: Add a Landlock sandbox example

2017-04-18 Thread Kees Cook
On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
> Add a basic sandbox tool to create a process isolated from some part of
> the system. This sandbox create a read-only environment. It is only
> allowed to write to a character device such as a TTY:
>
>   # :> X
>   # echo $?
>   0
>   # ./samples/bpf/landlock1 /bin/sh -i
>   Launching a new sandboxed process.
>   # :> Y
>   cannot create Y: Operation not permitted
>
> Changes since v5:
> * cosmetic fixes
> * rebase
>
> Changes since v4:
> * write Landlock rule in C and compiled it with LLVM
> * remove cgroup handling
> * remove path handling: only handle a read-only environment
> * remove errno return codes
>
> Changes since v3:
> * remove seccomp and origin field: completely free from seccomp programs
> * handle more FS-related hooks
> * handle inode hooks and directory traversal
> * add faked but consistent view thanks to ENOENT
> * add /lib64 in the example
> * fix spelling
> * rename some types and definitions (e.g. SECCOMP_ADD_LANDLOCK_RULE)
>
> Changes since v2:
> * use BPF_PROG_ATTACH for cgroup handling
>
> Signed-off-by: Mickaël Salaün <m...@digikod.net>
> Cc: Alexei Starovoitov <a...@kernel.org>
> Cc: Andy Lutomirski <l...@amacapital.net>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: David S. Miller <da...@davemloft.net>
> Cc: James Morris <james.l.mor...@oracle.com>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Serge E. Hallyn <se...@hallyn.com>
> ---
>  samples/bpf/Makefile |   4 ++
>  samples/bpf/bpf_load.c   |  31 +++--
>  samples/bpf/landlock1_kern.c |  46 +++
>  samples/bpf/landlock1_user.c | 102 
> +++
>  4 files changed, 179 insertions(+), 4 deletions(-)
>  create mode 100644 samples/bpf/landlock1_kern.c
>  create mode 100644 samples/bpf/landlock1_user.c
>
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index d42b495b0992..4743674a3fa3 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -36,6 +36,7 @@ hostprogs-y += lwt_len_hist
>  hostprogs-y += xdp_tx_iptunnel
>  hostprogs-y += test_map_in_map
>  hostprogs-y += per_socket_stats_example
> +hostprogs-y += landlock1
>
>  # Libbpf dependencies
>  LIBBPF := ../../tools/lib/bpf/bpf.o
> @@ -76,6 +77,7 @@ lwt_len_hist-objs := bpf_load.o $(LIBBPF) 
> lwt_len_hist_user.o
>  xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o
>  test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o
>  per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o
> +landlock1-objs := bpf_load.o $(LIBBPF) landlock1_user.o
>
>  # Tell kbuild to always build the programs
>  always := $(hostprogs-y)
> @@ -111,6 +113,7 @@ always += lwt_len_hist_kern.o
>  always += xdp_tx_iptunnel_kern.o
>  always += test_map_in_map_kern.o
>  always += cookie_uid_helper_example.o
> +always += landlock1_kern.o
>
>  HOSTCFLAGS += -I$(objtree)/usr/include
>  HOSTCFLAGS += -I$(srctree)/tools/lib/
> @@ -146,6 +149,7 @@ HOSTLOADLIBES_tc_l2_redirect += -l elf
>  HOSTLOADLIBES_lwt_len_hist += -l elf
>  HOSTLOADLIBES_xdp_tx_iptunnel += -lelf
>  HOSTLOADLIBES_test_map_in_map += -lelf
> +HOSTLOADLIBES_landlock1 += -lelf
>
>  # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
> cmdline:
>  #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
> CLANG=~/git/llvm/build/bin/clang
> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> index 4a3460d7c01f..3713e5e2e998 100644
> --- a/samples/bpf/bpf_load.c
> +++ b/samples/bpf/bpf_load.c
> @@ -29,6 +29,8 @@
>
>  static char license[128];
>  static int kern_version;
> +static union bpf_prog_subtype subtype = {};
> +static bool has_subtype;
>  static bool processed_sec[128];
>  char bpf_log_buf[BPF_LOG_BUF_SIZE];
>  int map_fd[MAX_MAPS];
> @@ -68,6 +70,7 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
> bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
> bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
> bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
> +   bool is_landlock = strncmp(event, "landlock", 8) == 0;
> size_t insns_cnt = size / sizeof(struct bpf_insn);
> enum bpf_prog_type prog_type;
> char buf[256];
> @@ -94,6 +97,13 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
> prog_type = BPF_PROG_TYPE_CGROUP_SKB;
> } else if (is_cgroup_sk) {
> prog_type = BPF

Re: [PATCH net-next v6 04/11] landlock: Add LSM hooks related to filesystem

2017-04-18 Thread Kees Cook
On Tue, Apr 18, 2017 at 3:44 PM, Mickaël Salaün <m...@digikod.net> wrote:
>
> On 19/04/2017 00:17, Kees Cook wrote:
>> On Tue, Mar 28, 2017 at 4:46 PM, Mickaël Salaün <m...@digikod.net> wrote:
>>> Handle 33 filesystem-related LSM hooks for the Landlock filesystem
>>> event: LANDLOCK_SUBTYPE_EVENT_FS.
>>>
>>> A Landlock event wrap LSM hooks for similar kernel object types (e.g.
>>> struct file, struct path...). Multiple LSM hooks can trigger the same
>>> Landlock event.
>>>
>>> Landlock handle nine coarse-grained actions: read, write, execute, new,
>>> get, remove, ioctl, lock and fcntl. Each of them abstract LSM hook
>>> access control in a way that can be extended in the future.
>>>
>>> The Landlock LSM hook registration is done after other LSM to only run
>>> actions from user-space, via eBPF programs, if the access was granted by
>>> major (privileged) LSMs.
>>>
>>> Changes since v5:
>>> * split hooks.[ch] into hooks.[ch] and hooks_fs.[ch]
>>> * add more documentation
>>> * cosmetic fixes
>>>
>>> Changes since v4:
>>> * add LSM hook abstraction called Landlock event
>>>   * use the compiler type checking to verify hooks use by an event
>>>   * handle all filesystem related LSM hooks (e.g. file_permission,
>>> mmap_file, sb_mount...)
>>> * register BPF programs for Landlock just after LSM hooks registration
>>> * move hooks registration after other LSMs
>>> * add failsafes to check if a hook is not used by the kernel
>>> * allow partial raw value access form the context (needed for programs
>>>   generated by LLVM)
>>>
>>> Changes since v3:
>>> * split commit
>>> * add hooks dealing with struct inode and struct path pointers:
>>>   inode_permission and inode_getattr
>>> * add abstraction over eBPF helper arguments thanks to wrapping structs
>>>
>>> Signed-off-by: Mickaël Salaün <m...@digikod.net>
>>> Cc: Alexei Starovoitov <a...@kernel.org>
>>> Cc: Andy Lutomirski <l...@amacapital.net>
>>> Cc: Daniel Borkmann <dan...@iogearbox.net>
>>> Cc: David S. Miller <da...@davemloft.net>
>>> Cc: James Morris <james.l.mor...@oracle.com>
>>> Cc: Kees Cook <keesc...@chromium.org>
>>> Cc: Serge E. Hallyn <se...@hallyn.com>
>>> ---
>>>  include/linux/lsm_hooks.h|   5 +
>>>  security/landlock/Makefile   |   4 +-
>>>  security/landlock/hooks.c| 115 +
>>>  security/landlock/hooks.h| 177 ++
>>>  security/landlock/hooks_fs.c | 563 
>>> +++
>>>  security/landlock/hooks_fs.h |  19 ++
>>>  security/landlock/init.c |  13 +
>>>  security/security.c  |   7 +-
>>>  8 files changed, 901 insertions(+), 2 deletions(-)
>>>  create mode 100644 security/landlock/hooks.c
>>>  create mode 100644 security/landlock/hooks.h
>>>  create mode 100644 security/landlock/hooks_fs.c
>>>  create mode 100644 security/landlock/hooks_fs.h
>>>
>>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>>> index e29d4c62a3c8..884289166a0e 100644
>>> --- a/include/linux/lsm_hooks.h
>>> +++ b/include/linux/lsm_hooks.h
>>> @@ -1920,5 +1920,10 @@ void __init loadpin_add_hooks(void);
>>>  #else
>>>  static inline void loadpin_add_hooks(void) { };
>>>  #endif
>>> +#ifdef CONFIG_SECURITY_LANDLOCK
>>> +extern void __init landlock_add_hooks(void);
>>> +#else
>>> +static inline void __init landlock_add_hooks(void) { }
>>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>>>
>>>  #endif /* ! __LINUX_LSM_HOOKS_H */
>>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>>> index 7205f9a7a2ee..c0db504a6335 100644
>>> --- a/security/landlock/Makefile
>>> +++ b/security/landlock/Makefile
>>> @@ -1,3 +1,5 @@
>>> +ccflags-$(CONFIG_SECURITY_LANDLOCK) += -Werror=unused-function
>>
>> Why is this needed? If it can't be avoided, a comment should exist
>> here explaining why.
>
> This is useful to catch defined but unused hooks: error out if a
> HOOK_NEW_FS(foo) is not used with a HOOK_INIT_FS(foo) in the struct
> security_hook_list landlock_hooks.

Gotcha. Please convert into a comment for the next revision. :)

>
>>
>>> [...]
>>> @@ -127,3 +132,11 @@ static struct bpf_prog_type_list bpf_landlock_type 
>>>

  1   2   3   4   5   6   7   >