from:"Kees Cook"

Re: Freeze when using ipheth+IPsec+IPv6

2018-11-29 Thread Kees Cook

17] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W 
> (1.27 ) 09/12/2017
> [ 1563.658019] RIP: 0010:tcp_recvmsg+0x647/0xb40
> [ 1563.658020] RSP: 0018:b77e010f7cf8 EFLAGS: 00010282
> [ 1563.658022] RAX:  RBX: 416bcf42 RCX: 
> 0006
> [ 1563.658023] RDX: 0007 RSI: 0086 RDI: 
> a37a8dc95610
> [ 1563.658024] RBP: b77e010f7db8 R08: 0013fd88 R09: 
> 0004
> [ 1563.658026] R10: a37a3b1180c8 R11: 0001 R12: 
> a37a81d40e00
> [ 1563.658027] R13: a37a3b118000 R14: a37a3b118524 R15: 
> 
> [ 1563.658028] FS:  738f795c0700() GS:a37a8dc8() 
> knlGS:
> [ 1563.658030] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1563.658031] CR2: 7f967818b048 CR3: 00024200c003 CR4: 
> 003606e0
> [ 1563.658032] Call Trace:
> [ 1563.658040]  inet_recvmsg+0x5c/0x110
> [ 1563.658046]  __sys_recvfrom+0xf2/0x160
> [ 1563.658054]  __x64_sys_recvfrom+0x1f/0x30
> [ 1563.658060]  do_syscall_64+0x72/0x1c0
> [ 1563.658062]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1563.658065] RIP: 0033:0x73901a71deae
> [ 1563.658070] RSP: 002b:738f795bee50 EFLAGS: 0246 ORIG_RAX: 
> 002d
> [ 1563.658080] RAX: ffda RBX: 0028 RCX: 
> 73901a71deae
> [ 1563.658085] RDX: 0404 RSI: 738f087955a7 RDI: 
> 0028
> [ 1563.658089] RBP:  R08:  R09: 
> 
> [ 1563.658092] R10:  R11: 0246 R12: 
> 738f087955a7
> [ 1563.658097] R13: 0404 R14:  R15: 
> 
> [ 1563.658102] Code: ff ff 41 8b 8d 20 05 00 00 48 c7 c7 40 47 05 ae 4c 89 95 
> 48 ff ff ff 41 8b 54 24 28 44 8b 85 70 ff ff ff 41 8b 36 e8 49 0d 93 ff <0f> 
> 0b 4c 8b 95 48 ff ff ff e9 89 fb ff ff 49 8b 55 60 83 e2 02
> [ 1563.658219] ---[ end trace e7da03c87ec5c408 ]---
>
> and finally a NULL pointer dereference:
>
> [ 1563.658223] BUG: unable to handle kernel NULL pointer dereference at 
> 0028
> [ 1563.658230] PGD 0 P4D 0
> [ 1563.658234] Oops:  [#1] PREEMPT SMP PTI
> [ 1563.658237] Modules linked in: esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel 
> bnep ipheth rtsx_pci_sdmmc snd_hda_codec_realtek iwlmvm snd_hda_codec_generic 
> snd_hda_codec_hdmi snd_hda_intel iwlwifi snd_hda_codec snd_hwdep rtsx_pci 
> snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
> [ 1563.658253] CPU: 1 PID: 2177 Comm: pool Tainted: GW   T 4.17.0 
> #22
> [ 1563.658255] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W 
> (1.27 ) 09/12/2017
> [ 1563.658258] RIP: 0010:tcp_recvmsg+0x1eb/0xb40
> [ 1563.658260] RSP: 0018:b77e010f7cf8 EFLAGS: 00010282
> [ 1563.658263] RAX:  RBX: 416bcf42 RCX: 
> 0006
> [ 1563.658265] RDX: 0007 RSI: 0086 RDI: 
> a37a8dc95610
> [ 1563.658268] RBP: b77e010f7db8 R08: 0013fd88 R09: 
> 0004
> [ 1563.658270] R10: a37a3b1180c8 R11: 0001 R12: 
> a37a81d40e00
> [ 1563.658272] R13: a37a3b118000 R14: a37a3b118524 R15: 
> 
> [ 1563.658275] FS:  738f795c0700() GS:a37a8dc8() 
> knlGS:
> [ 1563.658278] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1563.658280] CR2: 0028 CR3: 00024200c003 CR4: 
> 003606e0
> [ 1563.658282] Call Trace:
> [ 1563.658287]  inet_recvmsg+0x5c/0x110
> [ 1563.658291]  __sys_recvfrom+0xf2/0x160
> [ 1563.658295]  __x64_sys_recvfrom+0x1f/0x30
> [ 1563.658298]  do_syscall_64+0x72/0x1c0
> [ 1563.658302]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1563.658304] RIP: 0033:0x73901a71deae
> [ 1563.658306] RSP: 002b:738f795bee50 EFLAGS: 0246 ORIG_RAX: 
> 002d
> [ 1563.658309] RAX: ffda RBX: 0028 RCX: 
> 73901a71deae
> [ 1563.658311] RDX: 0404 RSI: 738f087955a7 RDI: 
> 0028
> [ 1563.658312] RBP:  R08:  R09: 
> 
> [ 1563.658314] R10:  R11: 0246 R12: 
> 738f087955a7
> [ 1563.658316] R13: 0404 R14:  R15: 
> 
> [ 1563.658318] Code: 8b 44 24 78 41 39 d8 77 57 41 f6 44 24 34 01 0f 85 24 01 
> 00 00 45 85 ff 0f 84 40 04 00 00 49 8b 04 24 49 39 c2 0f 84 1d 02 00 00 <8b> 
> 50 28 41 8b 1e 39 d3 0f 88 f4 03 00 00 49 89 c4 29 d3 41 f6
> [ 1563.658365] RIP: tcp_recvmsg+0x1eb/0xb40 RSP: b77e010f7cf8
> [ 1563.658366] CR2: 0028
> [ 1563.658369] ---[ end trace e7da03c87ec5c409 ]---
>
> If you need more information, please ask.
>
> Regards,
> - --
> Yves-Alexis
> -BEGIN PGP SIGNATURE-
>
> iQEzBAEBCgAdFiEE8vi34Qgfo83x35gF3rYcyPpXRFsFAlsXmYcACgkQ3rYcyPpX
> RFtK6QgArIJyLOT8Lot0jdQehm9MfL6iNUWNSHbEckhK80zYQCLUodj8VQJsmeu1
> 1hZwvg/Kuw0vxLG3i744NxcbCncfoaBUkZHoUmCZxFzyUeQVviAf9EaLp6cU0JPk
> ZBSKPeoPMF9WlBKecV9O/j6T6FRjbSmV/J7esj6vNFXm3iwOh1Yp0cugpU+j+/IA
> BxWVkKWZqS/uxtXaakoYdYOvrcRRpxcGKNXHajGW2AKXqybfoPgx0tSWzQ8bpn/o
> 3NtU9AL5flo4CgmnSY+qXtwT1fnNEtSVbbRmWyrMRpzzLLzTE2v4Pn5043J1Q1C6
> EmfVzeYke69MSSGG/fqrLeEV6PzLZQ==
> =C7Mx
> -END PGP SIGNATURE-



-- 
Kees Cook

Re: [PATCH v6 10/18] x86/power/64: Remove VLA usage

2018-07-25 Thread Kees Cook

On Wed, Jul 25, 2018 at 4:32 AM, Rafael J. Wysocki  wrote:
> On Tue, Jul 24, 2018 at 6:49 PM, Kees Cook  wrote:
>> In the quest to remove all stack VLA usage from the kernel[1], this
>> removes the discouraged use of AHASH_REQUEST_ON_STACK by switching to
>> shash directly and allocating the descriptor in heap memory (which should
>> be fine: the tfm has already been allocated there too).
>>
>> [1] 
>> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
>>
>> Signed-off-by: Kees Cook 
>> Acked-by: Pavel Machek 
>
> I think I can queue this up if there are no objections from others.
>
> Do you want me to do that?

Sure thing. It looks like the other stand-alone patches like this one
are getting taken into the non-crypto trees, so that's fine.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH bpf 3/3] bpf: undo prog rejection on read-only lock failure

2018-07-02 Thread Kees Cook

On Fri, Jun 29, 2018 at 4:47 PM, Daniel Borkmann  wrote:
> On 06/29/2018 08:42 PM, Kees Cook wrote:
>> On Thu, Jun 28, 2018 at 2:34 PM, Daniel Borkmann  
>> wrote:
>>> Kees suggested that if set_memory_*() can fail, we should annotate it with
>>> __must_check, and all callers need to deal with it gracefully given those
>>> set_memory_*() markings aren't "advisory", but they're expected to actually
>>> do what they say. This might be an option worth to move forward in future
>>> but would at the same time require that set_memory_*() calls from supporting
>>> archs are guaranteed to be "atomic" in that they provide rollback if part
>>> of the range fails, once that happened, the transition from RW -> RO could
>>> be made more robust that way, while subsequent RO -> RW transition /must/
>>> continue guaranteeing to always succeed the undo part.
>>
>> Does this mean we can have BPF filters that aren't read-only then?
>> What's the situation where set_memory_ro() fails? (Can it be induced
>> by the user?)
>
> My understanding is that the cpa_process_alias() would attempt to also change
> attributes of physmap ranges, and it found that a large page had to be split
> for this but failed in doing so thus attributes couldn't be updated there due
> to page alloc error. Attempting to change the primary mapping which would be
> directly the addr passed to set_memory_ro() was however set to read-only
> despite error. While for reproduction I had a toggle on the alloc_pages() in
> split_large_page() to have it fail, I only could trigger it occasionally; I
> used the selftest suite in a loop to stress test and it hit about or twice
> over hours.

Okay, so it's pretty rare; that's good! :P

It really seems like this should be a situation that never fails, but
if we ARE going to allow failures, then I think we need to propagate
them up to callers. That means modules could fail to load in these
cases, etc, etc. Since this is a fundamental protection, we need to
either never fail to set things RO or we need to disallow operation
continuing in the face of something NOT being RO.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH bpf 3/3] bpf: undo prog rejection on read-only lock failure

2018-06-29 Thread Kees Cook

On Thu, Jun 28, 2018 at 2:34 PM, Daniel Borkmann  wrote:
> Kees suggested that if set_memory_*() can fail, we should annotate it with
> __must_check, and all callers need to deal with it gracefully given those
> set_memory_*() markings aren't "advisory", but they're expected to actually
> do what they say. This might be an option worth to move forward in future
> but would at the same time require that set_memory_*() calls from supporting
> archs are guaranteed to be "atomic" in that they provide rollback if part
> of the range fails, once that happened, the transition from RW -> RO could
> be made more robust that way, while subsequent RO -> RW transition /must/
> continue guaranteeing to always succeed the undo part.

Does this mean we can have BPF filters that aren't read-only then?
What's the situation where set_memory_ro() fails? (Can it be induced
by the user?)

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next] rtnetlink: Fix null-ptr-deref in rtnl_newlink

2018-06-01 Thread Kees Cook

On Fri, Jun 1, 2018 at 1:26 AM, Eric Dumazet  wrote:
> On Fri, Jun 1, 2018 at 4:18 AM Prashant Bhole
>  wrote:
>>
>> In rtnl_newlink(), NULL check is performed on m_ops however member of
>> ops is accessed. Fixed by accessing member of m_ops instead of ops.
>>
>> [  345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110
>> [  345.432629] Read of size 4 at addr 0088 by task ip/986
>> [  345.432629]
>> [  345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9
>> [  345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.10.2-1ubuntu1 04/01/2014
>> [  345.432629] Call Trace:
>> [  345.432629]  dump_stack+0xc6/0x150
>> [  345.432629]  ? dump_stack_print_info.cold.0+0x1b/0x1b
>> [  345.432629]  ? kasan_report+0xb4/0x410
>> [  345.432629]  kasan_report.cold.4+0x8f/0x91
>> [  345.432629]  ? rtnl_newlink+0x400/0x1110
>> [  345.432629]  rtnl_newlink+0x400/0x1110
>> [...]
>>
>> Fixes: ccf8dbcd062a ("rtnetlink: Remove VLA usage")
>> Signed-off-by: Prashant Bhole 
>> ---
>>  net/core/rtnetlink.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>> index 8ca49a0e13fb..9a1ba2015ad8 100644
>> --- a/net/core/rtnetlink.c
>> +++ b/net/core/rtnetlink.c
>> @@ -2936,7 +2936,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
>> nlmsghdr *nlh,
>> }
>>
>> if (m_ops) {
>> -   if (ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
>> +   if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
>> return -EINVAL;
>
>
> Oh nice
>
> CC Kees Cook.

Argh. Thank you, yes.

Acked-by: Kees Cook 

-Kees

-- 
Kees Cook
Pixel Security

Re: [net-next:master 375/376] net/core/rtnetlink.c:3099:1: warning: the frame size of 1280 bytes is larger than 1024 bytes

2018-06-01 Thread Kees Cook

On Thu, May 31, 2018 at 10:07 PM, kbuild test robot  wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
> master
> head:   4b8e6ac41a594ea67ded6af6af5935f03221ea4c
> commit: ccf8dbcd062a930e64741c939ca784d15316aa0c [375/376] rtnetlink: Remove 
> VLA usage
> config: um-x86_64_defconfig (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> git checkout ccf8dbcd062a930e64741c939ca784d15316aa0c
> # save the attached .config to linux build tree
> make ARCH=um SUBARCH=x86_64
>
> All warnings (new ones prefixed by >>):
>
>net/core/rtnetlink.c: In function 'rtnl_newlink':
>>> net/core/rtnetlink.c:3099:1: warning: the frame size of 1280 bytes is 
>>> larger than 1024 bytes [-Wframe-larger-than=]
> }
> ^

The recent removal of the VLA has exposed how large it's possible for
this stack allocation to get (i.e. it was hidden from the checker
before because it was a VLA). This warning doesn't trip on regular
x86_64 because the -Wframe-larger-than is 2048. It seems like 64-bit
um should have the same 64-bit value (instead of using the 32-bit
value):

arch/um/configs/x86_64_defconfig:CONFIG_FRAME_WARN=1024

lib/Kconfig.debug:
config FRAME_WARN
int "Warn for stack frames larger than (needs gcc 4.4)"
range 0 8192
default 3072 if KASAN_EXTRA
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
default 1280 if (!64BIT && PARISC)
default 1024 if (!64BIT && !PARISC)
default 2048 if 64BIT

Just dropping the defconfig there should fix it. (And I think it was
just a mistake to port that value when splitting the um defconfig in
commit e40f04d040c6 ("arch/um: make it work with defconfig and
x86_64").

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper

2018-05-10 Thread Kees Cook

On Fri, May 4, 2018 at 12:56 PM, Luis R. Rodriguez <mcg...@kernel.org> wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.

Me too, please. And likely linux-security-module@ and Jessica too.

> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
>> Introduce helper:
>> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
>> struct umh_info {
>>struct file *pipe_to_umh;
>>struct file *pipe_from_umh;
>>pid_t pid;
>> };
>>
>> that GPLed kernel modules (signed or unsigned) can use it to execute part
>> of its own data as swappable user mode process.
>>
>> The kernel will do:
>> - mount "tmpfs"
>> - allocate a unique file in tmpfs
>> - populate that file with [data, data + len] bytes
>> - user-mode-helper code will do_execve that file and, before the process
>>   starts, the kernel will create two unix pipes for bidirectional
>>   communication between kernel module and umh
>> - close tmpfs file, effectively deleting it
>> - the fork_usermode_blob will return zero on success and populate
>>   'struct umh_info' with two unix pipes and the pid of the user process

I'm trying to think how LSMs can successfully reason about the
resulting exec(). In the past, we've replaced "blob" style interfaces
with file-based interfaces (e.g. init_module() -> finit_module(),
kexec_load() -> kexec_file_load()) to better let the kernel understand
the origin of executable content. Here the intent is fine: we're
getting the exec from an already-loaded module, etc, etc. I'm trying
to think specifically about the interface.

How can the ultimate exec get tied back to the kernel module in a way
that the LSM can query? Right now the hooks hit during exec are:
kernel_read_file() and kernel_post_read_file() of tmpfs file,
bprm_set_creds(), bprm_check(), bprm_commiting_creds(),
bprm_commited_creds(). It seems silly to me for an LSM to perform
these checks at all since I would expect the _meaningful_ check to be
finit_module() of the module itself. Having a way for an LSM to know
the exec is tied to a kernel module would let them skip the nonsense
checks.

Since the process for doing the usermode_blob is defined by the kernel
module build/link/objcopy process, could we tighten the
fork_usermode_blob() interface to point to the kernel module itself,
rather than leaving it an open-ended "blob" interface? Given our
history of needing to replace blob interfaces with file interfaces,
I'm cautious to add a new blob interface. Maybe just pull all the
blob-finding/loading into the interface, and just make it something
like fork_usermode_kmod(struct module *mod, struct umh_info *info) ?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v6 05/13] firmware_loader: enhance Kconfig documentation over FW_LOADER

2018-05-09 Thread Kees Cook

On Wed, May 9, 2018 at 1:55 PM, Luis R. Rodriguez <mcg...@kernel.org> wrote:
> On Tue, May 08, 2018 at 03:42:33PM -0700, Kees Cook wrote:
>> On Tue, May 8, 2018 at 11:12 AM, Luis R. Rodriguez <mcg...@kernel.org> wrote:
>> > + This used to be the default firmware loading facility, and udev 
>> > used
>> > + to listen for uvents to load firmware for the kernel. The 
>> > firmware
>> > + loading facility functionality in udev has been removed, as such 
>> > it
>> > + can no longer be relied upon as a fallback mechanism. Linux no 
>> > longer
>> > + relies on or uses a fallback mechanism in userspace. If you need 
>> > to
>> > + rely on one refer to the permissively licensed firmwared:
>>
>> Typo: firmware
>
> Thanks fixed all typos except this one, this one is meant to be firmwared as
> that is the name of the project, the url is below.
>
>>
>> > +
>> > + https://github.com/teg/firmwared

Oh! Yes, hah. :) Thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v6 00/13] firmware_loader changes for v4.18

2018-05-08 Thread Kees Cook

On Tue, May 8, 2018 at 11:12 AM, Luis R. Rodriguez <mcg...@kernel.org> wrote:
> Greg,
>
> Here is what I have queued up for the firmware_loader for v4.18. It
> includes a slew of cleanup work, and the new firmware_request_nowarn()
> which is quiet but enables the sysfs fallback mechanism. I've gone ahead
> and also queued up a few minor fixes for the firmware loader documentation
> which have come up recently. These changes are available on my git tree
> both based on linux-next [0] and Linus' latest tree [1]. Folks working
> on new developments for the firmware loader can use my linux-next
> branch 20180508-firmware_loader_for-v4.18-try2 for now.
>
> 0-day sends its blessings.
>
> The patches from Mimi's series still require a bit more discussion and
> review. The discussion over the EFI firmware fallback mechanism is still
> ongoing.
>
> As for the rename that you wanted, perhaps we can do this late in the
> merge window considering we're at rc4 now. I can prep something up for
> that later.
>
> Question, and specially rants are warmly welcomed.

I sent some typo catches, but with those fixed, please consider the
whole series:

Reviewed-by: Kees Cook <keesc...@chromium.org>

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v6 05/13] firmware_loader: enhance Kconfig documentation over FW_LOADER

2018-05-08 Thread Kees Cook

this feature enabled is to support a
> + driver which explicitly relies on this fallback mechanism. Only two
> + drivers need this today:
> +
> +   o CONFIG_LEDS_LP55XX_COMMON
> +   o CONFIG_DELL_RBU
> +
> + Outside of supporting the above drivers, another reason for needing
> + this may be that your firmware resides outside of the paths the 
> kernel
> + looks for and cannot possibily be specified using the firmware_class
> + path module parameter or kernel firmware_class path boot parameter
> + if firmware_class is built-in.
> +
> + A modern use case may be to temporarily mount a custom partition
> + during provisioning which is only accessible to userspace, and then
> + to use it to look for and fetch the required firmware. Such type of
> + driver functionality may not even ever be desirable upstream by
> + vendors, and as such is only required to be supported as an 
> interface
> + for provisioning. Since udev's firmware loading facility has been
> + removed you can use firmwared or a fork of it to customize how you
> + want to load firmware based on uevents issued.
> +
> + Enabling this option will increase your kernel image size by about
> + 13436 bytes.
> +
> + If you are unsure about this, say N here, unless you are Linux
> + distribution and need to support the above two drivers, or you are
> + certain you need to support some really custom firmware loading
> + facility in userspace.
>
>  config FW_LOADER_USER_HELPER_FALLBACK
> -   bool "Fallback user-helper invocation for firmware loading"
> -   depends on FW_LOADER
> -   select FW_LOADER_USER_HELPER
> +   bool "Force the firmware sysfs fallback mechanism when possible"
> +   depends on FW_LOADER_USER_HELPER
> help
> - This option enables / disables the invocation of user-helper
> - (e.g. udev) for loading firmware files as a fallback after the
> - direct file loading in kernel fails.  The user-mode helper is
> - no longer required unless you have a special firmware file that
> - resides in a non-standard path. Moreover, the udev support has
> - been deprecated upstream.
> + Enabling this option forces a sysfs userspace fallback mechanism
> + to be used for all firmware requests which explicitly do not 
> disable a
> + a fallback mechanism. Firmware calls which do prohibit a fallback
> + mechanism is request_firmware_direct(). This option is kept for
> +  backward compatibility purposes given this precise mechanism can 
> also
> + be enabled by setting the proc sysctl value to true:
> +
> +  /proc/sys/kernel/firmware_config/force_sysfs_fallback
>
>   If you are unsure about this, say N here.
>
> +endif # FW_LOADER
> +endmenu
> +
>  config WANT_DEV_COREDUMP
> bool
> help
> --
> 2.17.0
>


-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] net: dsa: drop some VLAs in switch.c

2018-05-05 Thread Kees Cook

On Sat, May 5, 2018 at 8:39 AM, Andrew Lunn <and...@lunn.ch> wrote:
> On Sat, May 05, 2018 at 12:36:36PM +0200, Salvatore Mesoraca wrote:
>> 2018-03-13 21:06 GMT+01:00 Florian Fainelli <f.faine...@gmail.com>:
>> > On 03/13/2018 12:58 PM, Vivien Didelot wrote:
>> >> Hi Salvatore,
>> >>
>> >> Salvatore Mesoraca <s.mesorac...@gmail.com> writes:
>> >>
>> >>> dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid
>> >>> 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports.
>> >>>
>> >>> [1] https://lkml.org/lkml/2018/3/7/621
>> >>>
>> >>> Signed-off-by: Salvatore Mesoraca <s.mesorac...@gmail.com>
>> >>
>> >> NAK.
>> >>
>> >> We are in the process to remove hardcoded limits such as DSA_MAX_PORTS
>> >> and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports.
>> >
>> > Then this means that we need to allocate a bitmap from the heap, which
>> > sounds a bit superfluous and could theoretically fail... not sure which
>> > way is better, but bumping the size to DSA_MAX_PORTS definitively does
>> > help people working on enabling -Wvla.
>>
>> Hi Florian,
>>
>> Should I consider this patch still NAKed or not?
>> Should I resend the patch with some modifications?
>
> Hi Salvatore
>
> We have been removing all uses of DSA_MAX_PORTS. I don't particularly
> like arbitrary limits on how many ports a switch can have, or how many
> switches a board can have.
>
> So i would prefer to not use DSA_MAX_PORTS here.
>
> You could make the bitmap part of the dsa_switch structure. This is
> allocated by dsa_switch_alloc() and is passed the number of ports.
> Doing the allocation there means you don't need to worry about it
> failing in dsa_switch_mdb_add() or dsa_switch_vlan_add().

Are dsa_switch_mdb_add() and dsa_switch_vlan_add() guaranteed to be
single-threaded?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] net: stmmac: Avoid VLA usage

2018-05-02 Thread Kees Cook

On Wed, May 2, 2018 at 1:54 AM, Jose Abreu <jose.ab...@synopsys.com> wrote:
> Hi Kees,
>
> On 01-05-2018 22:01, Kees Cook wrote:
>> In the quest to remove all stack VLAs from the kernel[1], this switches
>> the "status" stack buffer to use the existing small (8) upper bound on
>> how many queues can be checked for DMA, and adds a sanity-check just to
>> make sure it doesn't operate under pathological conditions.
>>
>> [1] 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_CA-2B55aFzCG-2DzNmZwX4A2FQpadafLfEzK6CC-3DqPXydAacU1RqZWA-40mail.gmail.com=DwIBAg=DPL6_X_6JkXFx7AXWqB0tg=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw=TBD6a7UY2VbpPmV9LOW_eHAyg8uPq1ZPDhq93VROTVE=4fvOST1HhWmZ4lThQe-dHCJYEXNOwey00BCXOWm8tKo=
>>
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>>
>
> I rather prefer the variables declaration in reverse-tree order,
> but thats just a minor pick.

I can explicitly reorder the other variables, if you want?

> Reviewed-by: Jose Abreu <joab...@synopsys.com>

Thanks!

> PS: Is VLA warning switch in gcc already active? Because I didn't
> see this warning in my builds.

It is not. A bunch of people have been building with KCFLAGS=-Wvla to
find the VLAs and sending patches. Once we get rid of them all, we can
add the flag to the top-level Makefile.

-Kees

-- 
Kees Cook
Pixel Security

[PATCH] net: stmmac: Avoid VLA usage

2018-05-01 Thread Kees Cook

In the quest to remove all stack VLAs from the kernel[1], this switches
the "status" stack buffer to use the existing small (8) upper bound on
how many queues can be checked for DMA, and adds a sanity-check just to
make sure it doesn't operate under pathological conditions.

[1] 
http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index b65e2d144698..19bdc23fa314 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2045,7 +2045,11 @@ static void stmmac_dma_interrupt(struct stmmac_priv 
*priv)
tx_channel_count : rx_channel_count;
u32 chan;
bool poll_scheduled = false;
-   int status[channels_to_check];
+   int status[max_t(u32, MTL_MAX_TX_QUEUES, MTL_MAX_RX_QUEUES)];
+
+   /* Make sure we never check beyond our status buffer. */
+   if (WARN_ON_ONCE(channels_to_check > ARRAY_SIZE(status)))
+   channels_to_check = ARRAY_SIZE(status);
 
/* Each DMA channel can be used for rx and tx simultaneously, yet
 * napi_struct is embedded in struct stmmac_rx_queue rather than in a
-- 
2.7.4


-- 
Kees Cook
Pixel Security

[PATCH v4 ipsec-next] xfrm: remove VLA usage in __xfrm6_sort()

2018-04-25 Thread Kees Cook

In the quest to remove all stack VLA usage removed from the kernel[1],
just use XFRM_MAX_DEPTH as already done for the "class" array. In one
case, it'll do this loop up to 5, the other caller up to 6.

[1] https://lkml.org/lkml/2018/3/7/621

Co-developed-by: Andreas Christoforou <andreaschrist...@gmail.com>
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
v4:
- actually remove memset(). :)
v3:
- adjust Subject and commit log (Steffen)
- use "= { }" instead of memset() (Stefano)
v2:
- use XFRM_MAX_DEPTH for "count" array (Steffen and Mathias).
---
 net/ipv6/xfrm6_state.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index 16f434791763..5bdca3d5d6b7 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -60,11 +60,9 @@ xfrm6_init_temprop(struct xfrm_state *x, const struct 
xfrm_tmpl *tmpl,
 static int
 __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void *p), int maxclass)
 {
-   int i;
+   int count[XFRM_MAX_DEPTH] = { };
int class[XFRM_MAX_DEPTH];
-   int count[maxclass];
-
-   memset(count, 0, sizeof(count));
+   int i;
 
for (i = 0; i < n; i++) {
    int c;
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH v3] ath9k: dfs: Remove VLA usage

2018-04-25 Thread Kees Cook

On Wed, Apr 25, 2018 at 12:26 AM, Kalle Valo <kv...@codeaurora.org> wrote:
> I have already applied an almost identical patch:
>
> ath9k: dfs: remove accidental use of stack VLA
>
> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath-next=9c27489a34548913baaaf3b2776e05d4a9389e3e

Ah! Cool, no worries. I didn't see that in linux-next yet. :)

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

[PATCH v3] ath9k: dfs: Remove VLA usage

2018-04-24 Thread Kees Cook

In the quest to remove all stack VLA usage from the kernel[1], this
redefines FFT_NUM_SAMPLES as a #define instead of const int, which still
triggers gcc's VLA checking pass.

[1] https://lkml.org/lkml/2018/3/7/621

Co-developed-by: Andreas Christoforou <andreaschrist...@gmail.com>
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
v3: replace FFT_NUM_SAMPLES as a #define (Joe)
---
 drivers/net/wireless/ath/ath9k/dfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/dfs.c 
b/drivers/net/wireless/ath/ath9k/dfs.c
index 6fee9a464cce..e6e56a925121 100644
--- a/drivers/net/wireless/ath/ath9k/dfs.c
+++ b/drivers/net/wireless/ath/ath9k/dfs.c
@@ -40,8 +40,8 @@ static const int BIN_DELTA_MIN= 1;
 static const int BIN_DELTA_MAX = 10;
 
 /* we need at least 3 deltas / 4 samples for a reliable chirp detection */
-#define NUM_DIFFS 3
-static const int FFT_NUM_SAMPLES   = (NUM_DIFFS + 1);
+#define NUM_DIFFS  3
+#define FFT_NUM_SAMPLES(NUM_DIFFS + 1)
 
 /* Threshold for difference of delta peaks */
 static const int MAX_DIFF  = 2;
-- 
2.7.4


-- 
Kees Cook
Pixel Security

[PATCH v3 ipsec-next] xfrm: remove VLA usage in __xfrm6_sort()

2018-04-24 Thread Kees Cook

In the quest to remove all stack VLA usage removed from the kernel[1],
just use XFRM_MAX_DEPTH as already done for the "class" array. In one
case, it'll do this loop up to 5, the other caller up to 6.

[1] https://lkml.org/lkml/2018/3/7/621

Co-developed-by: Andreas Christoforou <andreaschrist...@gmail.com>
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
v3:
- adjust Subject and commit log (Steffen)
- use "= { }" instead of memset() (Stefano)
- reorder variables (Stefano)
v2:
- use XFRM_MAX_DEPTH for "count" array (Steffen and Mathias).
---
 net/ipv6/xfrm6_state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index 16f434791763..eeb44b64ae7f 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -60,9 +60,9 @@ xfrm6_init_temprop(struct xfrm_state *x, const struct 
xfrm_tmpl *tmpl,
 static int
 __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void *p), int maxclass)
 {
-   int i;
+   int count[XFRM_MAX_DEPTH] = { };
int class[XFRM_MAX_DEPTH];
-   int count[maxclass];
+   int i;
 
memset(count, 0, sizeof(count));
 
-- 
2.7.4


-- 
Kees Cook
Pixel Security

[PATCH] net/tls: Remove VLA usage

2018-04-10 Thread Kees Cook

In the quest to remove VLAs from the kernel[1], this replaces the VLA
size with the only possible size used in the code, and adds a mechanism
to double-check future IV sizes.

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 net/tls/tls_sw.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 4dc766b03f00..71e79597f940 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -41,6 +41,8 @@
 #include 
 #include 
 
+#define MAX_IV_SIZETLS_CIPHER_AES_GCM_128_IV_SIZE
+
 static int tls_do_decryption(struct sock *sk,
 struct scatterlist *sgin,
 struct scatterlist *sgout,
@@ -673,7 +675,7 @@ static int decrypt_skb(struct sock *sk, struct sk_buff *skb,
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
-   char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + tls_ctx->rx.iv_size];
+   char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + MAX_IV_SIZE];
struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
struct scatterlist *sgin = _arr[0];
struct strp_msg *rxm = strp_msg(skb);
@@ -1094,6 +1096,12 @@ int tls_set_sw_offload(struct sock *sk, struct 
tls_context *ctx, int tx)
goto free_priv;
}
 
+   /* Sanity-check the IV size for stack allocations. */
+   if (iv_size > MAX_IV_SIZE) {
+   rc = -EINVAL;
+   goto free_priv;
+   }
+
cctx->prepend_size = TLS_HEADER_SIZE + nonce_size;
cctx->tag_size = tag_size;
cctx->overhead_size = cctx->prepend_size + cctx->tag_size;
-- 
2.7.4


-- 
Kees Cook
Pixel Security

[PATCH] ibmvnic: Define vnic_login_client_data name field as unsized array

2018-04-10 Thread Kees Cook

The "name" field of struct vnic_login_client_data is a char array of
undefined length. This should be written as "char name[]" so the compiler
can make better decisions about the field (for example, not assuming
it's a single character). This was noticed while trying to tighten the
CONFIG_FORTIFY_SOURCE checking.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index aad5658d79d5..35fbb41cd2d4 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3170,7 +3170,7 @@ static int send_version_xchg(struct ibmvnic_adapter 
*adapter)
 struct vnic_login_client_data {
u8  type;
__be16  len;
-   charname;
+   charname[];
 } __packed;
 
 static int vnic_client_data_len(struct ibmvnic_adapter *adapter)
@@ -3199,21 +3199,21 @@ static void vnic_add_client_data(struct ibmvnic_adapter 
*adapter,
vlcd->type = 1;
len = strlen(os_name) + 1;
vlcd->len = cpu_to_be16(len);
-   strncpy(>name, os_name, len);
-   vlcd = (struct vnic_login_client_data *)((char *)>name + len);
+   strncpy(vlcd->name, os_name, len);
+   vlcd = (struct vnic_login_client_data *)(vlcd->name + len);
 
/* Type 2 - LPAR name */
vlcd->type = 2;
len = strlen(utsname()->nodename) + 1;
vlcd->len = cpu_to_be16(len);
-   strncpy(>name, utsname()->nodename, len);
-   vlcd = (struct vnic_login_client_data *)((char *)>name + len);
+   strncpy(vlcd->name, utsname()->nodename, len);
+   vlcd = (struct vnic_login_client_data *)(vlcd->name + len);
 
/* Type 3 - device name */
vlcd->type = 3;
len = strlen(adapter->netdev->name) + 1;
vlcd->len = cpu_to_be16(len);
-   strncpy(>name, adapter->netdev->name, len);
+   strncpy(vlcd->name, adapter->netdev->name, len);
 }
 
 static int send_login(struct ibmvnic_adapter *adapter)
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-22 Thread Kees Cook

On Tue, Mar 20, 2018 at 4:23 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Sat, Mar 17, 2018 at 1:07 PM, Kees Cook <keesc...@chromium.org> wrote:
>>
>> No luck! :( gcc 4.4 refuses to play along. And, hilariously, not only
>> does it not change the complaint about __builtin_choose_expr(), it
>> also thinks that's a VLA now.
>
> Hmm. So thanks to the diseased mind of Martin Uecker, there's a better
> test for "__is_constant()":
>
>   /* Glory to Martin Uecker <martin.uec...@med.uni-goettingen.de> */
>   #define __is_constant(a) \
> (sizeof(int) == sizeof(*(1 ? ((void*)((a) * 0l)) : (int*)1)))
>
> that is actually *specified* by the C standard to work, and doesn't
> even depend on any gcc extensions.

I feel we risk awakening Cthulhu with this. :)

> The reason is some really subtle pointer conversion rules, where the
> type of the ternary operator will depend on whether one of the
> pointers is NULL or not.
>
> And the definition of NULL, in turn, very much depends on "integer
> constant expression that has the value 0".
>
> Are you willing to do one final try on a generic min/max? Same as my
> last patch, but using the above __is_constant() test instead of
> __builtin_constant_p?

So, this time it's not a catastrophic failure with gcc 4.4. Instead it
fails in 11 distinct places:

$ grep "first argument to ‘__builtin_choose_expr’ not a constant" log
| cut -d: -f1-2
crypto/ablkcipher.c:71
crypto/blkcipher.c:70
crypto/skcipher.c:95
mm/percpu.c:2453
net/ceph/osdmap.c:1545
net/ceph/osdmap.c:1756
net/ceph/osdmap.c:1763
mm/kmemleak.c:1371
mm/kmemleak.c:1403
drivers/infiniband/hw/hfi1/pio_copy.c:421
drivers/infiniband/hw/hfi1/pio_copy.c:547

Seems like it doesn't like void * arguments:

mm/percpu.c:
void *ptr;
...
base = min(ptr, base);


mm/kmemleak.c:
static void scan_large_block(void *start, void *end)
...
next = min(start + MAX_SCAN_SIZE, end);


I'll poke a bit more...

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-17 Thread Kees Cook

On Sat, Mar 17, 2018 at 11:52 AM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> So the above is completely insane, bit there is actually a chance that
> using that completely crazy "x -> sizeof(char[x])" conversion actually
> helps, because it really does have a (very odd) evaluation-time
> change.  sizeof() has to be evaluated as part of the constant
> expression evaluation, in ways that "__builtin_constant_p()" isn't
> specified to be done.
>
> But it is also definitely me grasping at straws. If that doesn't work
> for 4.4, there's nothing else I can possibly see.

No luck! :( gcc 4.4 refuses to play along. And, hilariously, not only
does it not change the complaint about __builtin_choose_expr(), it
also thinks that's a VLA now.

./include/linux/mm.h: In function ‘get_mm_hiwater_rss’:
./include/linux/mm.h:1567: warning: variable length array is used
./include/linux/mm.h:1567: error: first argument to
‘__builtin_choose_expr’ not a constant

6.8 is happy with it (of course).

I do think the earlier version (without the
sizeof-hiding-builting_constant_p) provides a template for a
const_max() that both you and Rasmus would be happy with, though!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-17 Thread Kees Cook

On Fri, Mar 16, 2018 at 12:27 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> Kees - is there some online "gcc-4.4 checker" somewhere? This does
> seem to work with my gcc. I actually tested some of those files you
> pointed at now.

Unfortunately my 4.4 test fails quickly:

./include/linux/jiffies.h: In function ‘jiffies_delta_to_clock_t’:
./include/linux/jiffies.h:444: error: first argument to
‘__builtin_choose_expr’ not a constant

static inline clock_t jiffies_delta_to_clock_t(long delta)
{
return jiffies_to_clock_t(max(0L, delta));
}

I think this is the same problem of using __builtin_constant_p() in
4.4 that we hit earlier? :(

-Kees

-- 
Kees Cook
Pixel Security

[PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-15 Thread Kees Cook

Patch 1 adds const_max_t(), patch 2 uses it in all the places max()
was used for stack arrays. Commit log from patch 1:

---snip---
kernel.h: Introduce const_max_t() for VLA removal

In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the evaluation
of the max() macro which, due to its construction, ends up converting
constant expression arguments into a constant value result. Attempts
to adjust the behavior of max() ran afoul of version-dependent compiler
behavior[2].

To work around this and still gain -Wvla coverage, this patch introduces
a new macro, const_max_t(), for use in these cases of stack array size
declaration, where the constant expressions are retained. Since this means
losing the double-evaluation protections of the max() macro, this macro is
designed to explicitly fail if used on non-constant arguments.

Older compilers will fail with the unhelpful message:

error: first argument to ‘__builtin_choose_expr’ not a constant

Newer compilers will fail with a hopefully more helpful message:

error: call to ‘__error_non_const_arg’ declared with attribute error: 
const_max_t() used with non-constant expression

To gain the ability to compare differing types, the desired type must
be explicitly declared, as with the existing max_t() macro. This is
needed when comparing different enum types and to allow things like:

int foo[const_max_t(size_t, 6, sizeof(something))];

[1] https://lkml.org/lkml/2018/3/7/621
[2] https://lkml.org/lkml/2018/3/10/170
---eol---

Hopefully this reads well as a summary from all the things that got tried.
I've tested this on allmodconfig builds with gcc 4.4.4 and 6.3.0, with and
without -Wvla.

-Kees

v5: explicit type argument
v4: forced size_t type

[PATCH v5 2/2] Remove false-positive VLAs when using max()

2018-03-15 Thread Kees Cook

As part of removing VLAs from the kernel[1], we want to build with -Wvla,
but it is overly pessimistic and only accepts constant expressions for
stack array sizes, instead of also constant values. The max() macro
triggers the warning, so this refactors these uses of max() to use the
new const_max() instead.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/input/touchscreen/cyttsp4_core.c |  2 +-
 fs/btrfs/tree-checker.c  |  3 ++-
 lib/vsprintf.c   |  5 +++--
 net/ipv4/proc.c  |  8 
 net/ipv6/proc.c  | 11 +--
 5 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/input/touchscreen/cyttsp4_core.c 
b/drivers/input/touchscreen/cyttsp4_core.c
index 727c3232517c..7fb9bd48e41c 100644
--- a/drivers/input/touchscreen/cyttsp4_core.c
+++ b/drivers/input/touchscreen/cyttsp4_core.c
@@ -868,7 +868,7 @@ static void cyttsp4_get_mt_touches(struct cyttsp4_mt_data 
*md, int num_cur_tch)
struct cyttsp4_touch tch;
int sig;
int i, j, t = 0;
-   int ids[max(CY_TMA1036_MAX_TCH, CY_TMA4XX_MAX_TCH)];
+   int ids[const_max_t(size_t, CY_TMA1036_MAX_TCH, CY_TMA4XX_MAX_TCH)];
 
memset(ids, 0, si->si_ofs.tch_abs[CY_TCH_T].max * sizeof(int));
for (i = 0; i < num_cur_tch; i++) {
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c3c8d48f6618..d83244e3821f 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -341,7 +341,8 @@ static int check_dir_item(struct btrfs_root *root,
 */
if (key->type == BTRFS_DIR_ITEM_KEY ||
key->type == BTRFS_XATTR_ITEM_KEY) {
-   char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)];
+   char namebuf[const_max_t(size_t, BTRFS_NAME_LEN,
+XATTR_NAME_MAX)];
 
read_extent_buffer(leaf, namebuf,
(unsigned long)(di + 1), name_len);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index d7a708f82559..12ff57a36171 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -744,8 +744,9 @@ char *resource_string(char *buf, char *end, struct resource 
*res,
 #define FLAG_BUF_SIZE  (2 * sizeof(res->flags))
 #define DECODED_BUF_SIZE   sizeof("[mem - 64bit pref window disabled]")
 #define RAW_BUF_SIZE   sizeof("[mem - flags 0x]")
-   char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
-2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
+   char sym[const_max_t(size_t,
+2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
+2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
 
char *p = sym, *pend = sym + sizeof(sym);
int decode = (fmt[0] == 'R') ? 1 : 0;
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index dc5edc8f7564..7f5c3b40dac9 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -46,7 +46,7 @@
 #include 
 #include 
 
-#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX)
+#define TCPUDP_MIB_MAX const_max_t(size_t, UDP_MIB_MAX, TCP_MIB_MAX)
 
 /*
  * Report socket allocation statistics [m...@utu.fi]
@@ -404,7 +404,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
struct net *net = seq->private;
int i;
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
seq_puts(seq, "\nTcp:");
for (i = 0; snmp4_tcp_list[i].name; i++)
@@ -421,7 +421,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
seq_printf(seq, " %lu", buff[i]);
}
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
snmp_get_cpu_field_batch(buff, snmp4_udp_list,
 net->mib.udp_statistics);
@@ -432,7 +432,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %lu", buff[i]);
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
/* the UDP and UDP-Lite MIBs are the same */
seq_puts(seq, "\nUdpLite:");
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index b67814242f78..b68c233de296 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -30,10 +30,9 @@
 #include 
 #include 
 
-#define MAX4(a, b, c, d) \
-   max_t(u32, max_t(u32, a, b), max_t(u32, c, d))
-#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \
-   IPSTATS_MIB_MAX, ICMP_MIB_MAX)
+#define SNMP_MIB_MAX const_max_t(u32,  \
+   const_max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX), \
+

[PATCH v5 1/2] kernel.h: Introduce const_max_t() for VLA removal

2018-03-15 Thread Kees Cook

In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the evaluation
of the max() macro which, due to its construction, ends up converting
constant expression arguments into a constant value result. Attempts
to adjust the behavior of max() ran afoul of version-dependent compiler
behavior[2].

To work around this and still gain -Wvla coverage, this patch introduces
a new macro, const_max_t(), for use in these cases of stack array size
declaration, where the constant expressions are retained. Since this means
losing the double-evaluation protections of the max() macro, this macro is
designed to explicitly fail if used on non-constant arguments.

Older compilers will fail with the unhelpful message:

error: first argument to ‘__builtin_choose_expr’ not a constant

Newer compilers will fail with a hopefully more helpful message:

error: call to ‘__error_non_const_arg’ declared with attribute error: 
const_max_t() used with non-constant expression

To gain the ability to compare differing types, the desired type must
be explicitly declared, as with the existing max_t() macro. This is
needed when comparing different enum types and to allow things like:

int foo[const_max_t(size_t, 6, sizeof(something))];

[1] https://lkml.org/lkml/2018/3/7/621
[2] https://lkml.org/lkml/2018/3/10/170

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/kernel.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..e14531781568 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -820,6 +820,25 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  x, y)
 
 /**
+ * const_max_t - return maximum of two compile-time constant expressions
+ * @type: type used for evaluation
+ * @x: first compile-time constant expression
+ * @y: second compile-time constant expression
+ *
+ * This has no multi-evaluation defenses, and must only ever be used with
+ * compile-time constant expressions (for example when calculating a stack
+ * array size).
+ */
+size_t __error_non_const_arg(void) \
+__compiletime_error("const_max_t() used with non-constant expression");
+#define const_max_t(type, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y),  \
+ (type)(x) > (type)(y) ?   \
+   (type)(x) : (type)(y),  \
+ __error_non_const_arg())
+
+/**
  * min3 - return minimum of three values
  * @x: first value
  * @y: second value
-- 
2.7.4

Re: [PATCH v4 1/2] kernel.h: Introduce const_max() for VLA removal

2018-03-15 Thread Kees Cook

On Thu, Mar 15, 2018 at 4:46 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> What I'm *not* so much ok with is "const_max(5,sizeof(x))" erroring
> out, or silently causing insane behavior due to hidden subtle type
> casts..

Yup! I like it as an explicit argument. Thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v4 1/2] kernel.h: Introduce const_max() for VLA removal

2018-03-15 Thread Kees Cook

On Thu, Mar 15, 2018 at 4:34 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Thu, Mar 15, 2018 at 3:46 PM, Kees Cook <keesc...@chromium.org> wrote:
>>
>> So, AIUI, I can either get strict type checking, in which case, this
>> is rejected (which I assume there is still a desire to have):
>>
>> int foo[const_max(6, sizeof(whatever))];
>
> Ehh, yes, that looks fairly sane, and erroring out would be annoying.
>
> But maybe we should just make the type explicit, and make it "const_max_t()"?
>
> I think all the existing users are of type "max_t()" anyway due to the
> very same issue, no?

All but one are using max()[1]. One case uses max_t() to get u32.

> At least if there's an explicit type like 'size_t', then passing in
> "-1" becoming a large unsigned integer is understandable and clear,
> not just some odd silent behavior.
>
> Put another way: I think it's unacceptable that
>
>  const_max(-1,6)
>
> magically becomes a huge positive number like in that patch of yours, but
>
>  const_max_t(size_t, -1, 6)
>
> *obviously* is a huge positive number.
>
> The two things would *do* the same thing, but in the second case the
> type is explicit and visible.
>
>> due to __builtin_types_compatible_p() rejecting it, or I can construct
>> a "positive arguments only" test, in which the above is accepted, but
>> this is rejected:
>
> That sounds acceptable too, although the "const_max_t()" thing is
> presumably going to be simpler?

I much prefer explicit typing, but both you and Rasmus mentioned
wanting the int/sizeof_t mixing. I'm totally happy with const_max_t()
-- even if it makes my line-wrapping harder due to the longer name. ;)

I'll resend in a moment...

-Kees

[1] https://patchwork.kernel.org/patch/10285709/

-- 
Kees Cook
Pixel Security

Re: [PATCH v4 1/2] kernel.h: Introduce const_max() for VLA removal

2018-03-15 Thread Kees Cook

On Thu, Mar 15, 2018 at 4:17 PM, Miguel Ojeda
<miguel.ojeda.sando...@gmail.com> wrote:
>> The full one, using your naming convention:
>>
>> #define const_max(x, y)  \
>> ({   \
>> if (!__builtin_constant_p(x))\
>> __error_not_const_arg(); \
>> if (!__builtin_constant_p(y))\
>> __error_not_const_arg(); \
>> if (!__builtin_types_compatible_p(typeof(x), typeof(y))) \
>> __error_incompatible_types();\
>> if ((x) < 0) \
>> __error_not_positive_arg();  \
>> if ((y) < 0) \
>> __error_not_positive_arg();  \
>> __builtin_choose_expr((x) > (y), (x), (y));  \
>> })
>>
>
> Nevermind... gcc doesn't take that as a constant expr, even if it
> compiles as one at -O0.

Yeah, unfortunately. :(

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v4 1/2] kernel.h: Introduce const_max() for VLA removal

2018-03-15 Thread Kees Cook

On Thu, Mar 15, 2018 at 3:23 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Thu, Mar 15, 2018 at 3:16 PM, Kees Cook <keesc...@chromium.org> wrote:
>>
>> size_t __error_not_const_arg(void) \
>> __compiletime_error("const_max() used with non-compile-time constant arg");
>> #define const_max(x, y) \
>> __builtin_choose_expr(__builtin_constant_p(x) &&\
>>   __builtin_constant_p(y),  \
>>   (typeof(x))(x) > (typeof(y))(y) ? \
>> (x) : (y),  \
>>   __error_not_const_arg())
>>
>> Is typeof() forcing enums to int? Regardless, I'll put this through
>> larger testing. How does that look?
>
> Ok, that alleviates my worry about one class of insane behavior, but
> it does raise a few other questions:
>
>  - what drugs is gcc on where (typeof(x)(x)) makes a difference? Funky.

Yeah, that's why I didn't even try that originally. But in looking
back at max() again, it seemed to be the only thing missing that would
handle the enum evaluation, which turned out to be true.

>  - this does have the usual "what happen if you do
>
>  const_max(-1,sizeof(x))
>
> where the comparison will now be done in 'size_t', and -1 ends up
> being a very very big unsigned integer.
>
> Is there no way to get that type checking inserted? Maybe now is a
> good point for that __builtin_types_compatible(), and add it to the
> constness checking (and change the name of that error case function)?

So, AIUI, I can either get strict type checking, in which case, this
is rejected (which I assume there is still a desire to have):

int foo[const_max(6, sizeof(whatever))];

due to __builtin_types_compatible_p() rejecting it, or I can construct
a "positive arguments only" test, in which the above is accepted, but
this is rejected:

int foo[const_max(-1, sizeof(whatever))];

By using this eye-bleed:

size_t __error_not_const_arg(void) \
__compiletime_error("const_max() used with non-compile-time constant arg");
size_t __error_not_positive_arg(void) \
__compiletime_error("const_max() used with negative arg");
#define const_max(x, y) \
__builtin_choose_expr(__builtin_constant_p(x) &&\
  __builtin_constant_p(y),  \
__builtin_choose_expr((x) >= 0 && (y) >= 0, \
  (typeof(x))(x) > (typeof(y))(y) ? \
(x) : (y),  \
  __error_not_positive_arg()),  \
__error_not_const_arg())

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v4 1/2] kernel.h: Introduce const_max() for VLA removal

2018-03-15 Thread Kees Cook

On Thu, Mar 15, 2018 at 2:42 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Thu, Mar 15, 2018 at 12:47 PM, Kees Cook <keesc...@chromium.org> wrote:
>>
>> To gain the ability to compare differing types, the arguments are
>> explicitly cast to size_t.
>
> Ugh, I really hate this.
>
> It silently does insane things if you do
>
>const_max(-1,6)
>
> and there is nothing in the name that implies that you can't use
> negative constants.

Yeah, I didn't like that effect either. I was seeing this:

./include/linux/kernel.h:836:14: warning: comparison between ‘enum
’ and ‘enum ’ [-Wenum-compare]
  (x) > (y) ? \
  ^
./include/linux/kernel.h:838:7: note: in definition of macro ‘const_max’
  (y),  \
   ^
net/ipv6/proc.c:34:11: note: in expansion of macro ‘const_max’
   const_max(IPSTATS_MIB_MAX, ICMP_MIB_MAX))
   ^

But it turns out that just doing a typeof() fixes this, and there's no
need for the hard cast to size_t:

size_t __error_not_const_arg(void) \
__compiletime_error("const_max() used with non-compile-time constant arg");
#define const_max(x, y) \
__builtin_choose_expr(__builtin_constant_p(x) &&\
  __builtin_constant_p(y),  \
  (typeof(x))(x) > (typeof(y))(y) ? \
(x) : (y),  \
  __error_not_const_arg())

Is typeof() forcing enums to int? Regardless, I'll put this through
larger testing. How does that look?

-Kees

-- 
Kees Cook
Pixel Security

[PATCH v4 2/2] Remove false-positive VLAs when using max()

2018-03-15 Thread Kees Cook

As part of removing VLAs from the kernel[1], we want to build with -Wvla,
but it is overly pessimistic and only accepts constant expressions for
stack array sizes, instead of also constant values. The max() macro
triggers the warning, so this refactors these uses of max() to use the
new const_max() instead.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/input/touchscreen/cyttsp4_core.c |  2 +-
 fs/btrfs/tree-checker.c  |  3 ++-
 lib/vsprintf.c   |  4 ++--
 net/ipv4/proc.c  |  8 
 net/ipv6/proc.c  | 10 --
 5 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/input/touchscreen/cyttsp4_core.c 
b/drivers/input/touchscreen/cyttsp4_core.c
index 727c3232517c..f89497940051 100644
--- a/drivers/input/touchscreen/cyttsp4_core.c
+++ b/drivers/input/touchscreen/cyttsp4_core.c
@@ -868,7 +868,7 @@ static void cyttsp4_get_mt_touches(struct cyttsp4_mt_data 
*md, int num_cur_tch)
struct cyttsp4_touch tch;
int sig;
int i, j, t = 0;
-   int ids[max(CY_TMA1036_MAX_TCH, CY_TMA4XX_MAX_TCH)];
+   int ids[const_max(CY_TMA1036_MAX_TCH, CY_TMA4XX_MAX_TCH)];
 
memset(ids, 0, si->si_ofs.tch_abs[CY_TCH_T].max * sizeof(int));
for (i = 0; i < num_cur_tch; i++) {
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c3c8d48f6618..1ddd6cc3c4fc 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -341,7 +341,8 @@ static int check_dir_item(struct btrfs_root *root,
 */
if (key->type == BTRFS_DIR_ITEM_KEY ||
key->type == BTRFS_XATTR_ITEM_KEY) {
-   char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)];
+   char namebuf[const_max(BTRFS_NAME_LEN,
+  XATTR_NAME_MAX)];
 
read_extent_buffer(leaf, namebuf,
(unsigned long)(di + 1), name_len);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index d7a708f82559..9d5610b643ce 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -744,8 +744,8 @@ char *resource_string(char *buf, char *end, struct resource 
*res,
 #define FLAG_BUF_SIZE  (2 * sizeof(res->flags))
 #define DECODED_BUF_SIZE   sizeof("[mem - 64bit pref window disabled]")
 #define RAW_BUF_SIZE   sizeof("[mem - flags 0x]")
-   char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
-2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
+   char sym[const_max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
+  2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
 
char *p = sym, *pend = sym + sizeof(sym);
int decode = (fmt[0] == 'R') ? 1 : 0;
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index dc5edc8f7564..fad6f989004e 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -46,7 +46,7 @@
 #include 
 #include 
 
-#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX)
+#define TCPUDP_MIB_MAX const_max(UDP_MIB_MAX, TCP_MIB_MAX)
 
 /*
  * Report socket allocation statistics [m...@utu.fi]
@@ -404,7 +404,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
struct net *net = seq->private;
int i;
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
seq_puts(seq, "\nTcp:");
for (i = 0; snmp4_tcp_list[i].name; i++)
@@ -421,7 +421,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
seq_printf(seq, " %lu", buff[i]);
}
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
snmp_get_cpu_field_batch(buff, snmp4_udp_list,
 net->mib.udp_statistics);
@@ -432,7 +432,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %lu", buff[i]);
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
/* the UDP and UDP-Lite MIBs are the same */
seq_puts(seq, "\nUdpLite:");
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index b67814242f78..58bbfc4fa7fa 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -30,10 +30,8 @@
 #include 
 #include 
 
-#define MAX4(a, b, c, d) \
-   max_t(u32, max_t(u32, a, b), max_t(u32, c, d))
-#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \
-   IPSTATS_MIB_MAX, ICMP_MIB_MAX)
+#define SNMP_MIB_MAX const_max(const_max(UDP_MIB_MAX, TCP_MIB_MAX), \
+  const_max(IPSTATS_MIB_MAX, ICMP_MIB_MAX))
 
 static int sockstat6_seq_show(struct seq_file *seq, void *v)
 {
@@ -199,7 +197,7 @@ stati

[PATCH v4 0/2] Remove false-positive VLAs when using max()

2018-03-15 Thread Kees Cook

I'm calling this "v4" since the last effort at this was v3, even
if it's a different approach. Patch 1 adds const_max(), patch 2
uses it in all the places max() was used for stack arrays. Commit
log from patch 1:

---snip---
kernel.h: Introduce const_max() for VLA removal

In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the evaluation
of the max() macro which, due to its construction, ends up converting
constant expression arguments into a constant value result. Attempts
to adjust the behavior of max() ran afoul of version-dependent compiler
behavior[2].

To work around this and still gain -Wvla coverage, this patch introduces
a new macro, const_max(), for use in these cases of stack array size
declaration, where the constant expressions are retained. Since this means
losing the double-evaluation protections of the max() macro, this macro is
designed to explicitly fail if used on non-constant arguments.

Older compilers will fail with the unhelpful message:

error: first argument to ‘__builtin_choose_expr’ not a constant

Newer compilers will fail with a hopefully more helpful message:

error: call to ‘__error_not_const_arg’ declared with attribute error: 
const_max() used with non-compile-time constant arg

To gain the ability to compare differing types, the arguments are
explicitly cast to size_t. Without this, some compiler versions will
fail when comparing different enum types or similar constant expression
cases. With the casting, it's possible to do things like:

int foo[const_max(6, sizeof(something))];

[1] https://lkml.org/lkml/2018/3/7/621
[2] https://lkml.org/lkml/2018/3/10/170
---eol---

Hopefully this reads well as a summary from all the things that got tried.
I've tested this on allmodconfig builds with gcc 4.4.4 and 6.3.0, with and
without -Wvla.

-Kees

[PATCH v4 1/2] kernel.h: Introduce const_max() for VLA removal

2018-03-15 Thread Kees Cook

In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the evaluation
of the max() macro which, due to its construction, ends up converting
constant expression arguments into a constant value result. Attempts
to adjust the behavior of max() ran afoul of version-dependent compiler
behavior[2].

To work around this and still gain -Wvla coverage, this patch introduces
a new macro, const_max(), for use in these cases of stack array size
declaration, where the constant expressions are retained. Since this means
losing the double-evaluation protections of the max() macro, this macro is
designed to explicitly fail if used on non-constant arguments.

Older compilers will fail with the unhelpful message:

error: first argument to ‘__builtin_choose_expr’ not a constant

Newer compilers will fail with a hopefully more helpful message:

error: call to ‘__error_not_const_arg’ declared with attribute error: 
const_max() used with non-compile-time constant arg

To gain the ability to compare differing types, the arguments are
explicitly cast to size_t. Without this, some compiler versions will
fail when comparing different enum types or similar constant expression
cases. With the casting, it's possible to do things like:

int foo[const_max(6, sizeof(something))];

[1] https://lkml.org/lkml/2018/3/7/621
[2] https://lkml.org/lkml/2018/3/10/170

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/kernel.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..012f588b5a25 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -820,6 +820,25 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  x, y)
 
 /**
+ * const_max - return maximum of two positive compile-time constant values
+ * @x: first compile-time constant value
+ * @y: second compile-time constant value
+ *
+ * This has no type checking nor multi-evaluation defenses, and must
+ * only ever be used with positive compile-time constant values (for
+ * example when calculating a stack array size).
+ */
+size_t __error_not_const_arg(void) \
+__compiletime_error("const_max() used with non-compile-time constant arg");
+#define const_max(x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y),  \
+ (size_t)(x) > (size_t)(y) ?   \
+   (size_t)(x) :   \
+   (size_t)(y),\
+ __error_not_const_arg())
+
+/**
  * min3 - return minimum of three values
  * @x: first value
  * @y: second value
-- 
2.7.4

Re: [RESEND] rsi: Remove stack VLA usage

2018-03-13 Thread Kees Cook

On Tue, Mar 13, 2018 at 7:11 PM, Tobin C. Harding <m...@tobin.cc> wrote:
> On Tue, Mar 13, 2018 at 11:00:47PM +0200, Andy Shevchenko wrote:
>> On Tue, Mar 13, 2018 at 10:17 PM, tcharding <m...@tobin.cc> wrote:
>> > On Mon, Mar 12, 2018 at 09:46:06AM +, Kalle Valo wrote:
>> >> tcharding <m...@tobin.cc> wrote:
>>
>> I'm pretty much sure it depends on the original email headers, like
>> above ^^^ — no name.
>> Perhaps git config on your side should be done.
>
> Thanks for the suggestion Andy but the 'tcharding' as the name was
> munged by either Kalle or patchwork.  I'm guessing patchwork.

Something you're sending from is using "tcharding" (see the email Andy
quotes). I see the headers as:

Date: Wed, 14 Mar 2018 07:17:57 +1100
From: tcharding <m...@tobin.cc>
...
Message-ID: <20180313201757.GK8631@eros>
X-Mailer: Mutt 1.5.24 (2015-08-30)
User-Agent: Mutt/1.5.24 (2015-08-30)

Your most recently email shows "Tobin C. Harding" though, and also
sent with Mutt...

Do you have multiple Mutt configurations? Is something lacking a
"From" header insertion and your MTA is filling it in for you from
your username?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-13 Thread Kees Cook

On Tue, Mar 13, 2018 at 2:02 PM, Andrew Morton
<a...@linux-foundation.org> wrote:
> On Mon, 12 Mar 2018 21:28:57 -0700 Kees Cook <keesc...@chromium.org> wrote:
>
>> On Mon, Mar 12, 2018 at 4:57 PM, Linus Torvalds
>> <torva...@linux-foundation.org> wrote:
>> > On Mon, Mar 12, 2018 at 3:55 PM, Andrew Morton
>> > <a...@linux-foundation.org> wrote:
>> >>
>> >> Replacing the __builtin_choose_expr() with ?: works of course.
>> >
>> > Hmm. That sounds like the right thing to do. We were so myopically
>> > staring at the __builtin_choose_expr() problem that we overlooked the
>> > obvious solution.
>> >
>> > Using __builtin_constant_p() together with a ?: is in fact our common
>> > pattern, so that should be fine. The only real reason to use
>> > __builtin_choose_expr() is if you want to get the *type* to vary
>> > depending on which side you choose, but that's not an issue for
>> > min/max.
>>
>> This doesn't solve it for -Wvla, unfortunately. That was the point of
>> Josh's original suggestion of __builtin_choose_expr().
>>
>> Try building with KCFLAGS=-Wval and checking net/ipv6/proc.c:
>>
>> net/ipv6/proc.c: In function ‘snmp6_seq_show_item’:
>> net/ipv6/proc.c:198:2: warning: ISO C90 forbids array ‘buff’ whose
>> size can’t be evaluated [-Wvla]
>>   unsigned long buff[SNMP_MIB_MAX];
>>   ^~~~
>
> PITA.  Didn't we once have a different way of detecting VLAs?  Some
> post-compilation asm parser, iirc.
>
> I suppose the world wouldn't end if we had a gcc version ifdef in
> kernel.h.  We'll get to remove it in, oh, ten years.

For fixing only 6 VLAs, we don't need all this effort. When it looked
like we could get away with just a "better" max(), sure. ;)

I'll send a "const_max()" which will refuse to work on
non-constant-values (so it doesn't get accidentally used on variables
that could be exposed to double-evaluation), and will work for stack
array declarations (to avoid the overly-sensitive -Wvla checks).

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-12 Thread Kees Cook

On Mon, Mar 12, 2018 at 4:57 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Mon, Mar 12, 2018 at 3:55 PM, Andrew Morton
> <a...@linux-foundation.org> wrote:
>>
>> Replacing the __builtin_choose_expr() with ?: works of course.
>
> Hmm. That sounds like the right thing to do. We were so myopically
> staring at the __builtin_choose_expr() problem that we overlooked the
> obvious solution.
>
> Using __builtin_constant_p() together with a ?: is in fact our common
> pattern, so that should be fine. The only real reason to use
> __builtin_choose_expr() is if you want to get the *type* to vary
> depending on which side you choose, but that's not an issue for
> min/max.

This doesn't solve it for -Wvla, unfortunately. That was the point of
Josh's original suggestion of __builtin_choose_expr().

Try building with KCFLAGS=-Wval and checking net/ipv6/proc.c:

net/ipv6/proc.c: In function ‘snmp6_seq_show_item’:
net/ipv6/proc.c:198:2: warning: ISO C90 forbids array ‘buff’ whose
size can’t be evaluated [-Wvla]
  unsigned long buff[SNMP_MIB_MAX];
  ^~~~


-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage

2018-03-10 Thread Kees Cook

On Sat, Mar 10, 2018 at 3:06 PM, Arend van Spriel
<arend.vanspr...@broadcom.com> wrote:
> On 3/9/2018 1:30 PM, Andreas Christoforou wrote:
>>
>> The kernel would like to have all stack VLA usage removed.
>
>
> I think there was a remark made earlier to give more explanation here. It
> should explain why we want "VLA on stack" removed.
>
>> Signed-off-by: Andreas Christoforou <andreaschrist...@gmail.com>
>> ---
>>   drivers/net/wireless/ath/ath9k/dfs.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath9k/dfs.c
>> b/drivers/net/wireless/ath/ath9k/dfs.c
>> index 6fee9a4..cfb0f84 100644
>> --- a/drivers/net/wireless/ath/ath9k/dfs.c
>> +++ b/drivers/net/wireless/ath/ath9k/dfs.c
>> @@ -41,7 +41,6 @@ static const int BIN_DELTA_MAX= 10;
>>
>>   /* we need at least 3 deltas / 4 samples for a reliable chirp detection
>> */
>>   #define NUM_DIFFS 3
>> -static const int FFT_NUM_SAMPLES   = (NUM_DIFFS + 1);
>>
>>   /* Threshold for difference of delta peaks */
>>   static const int MAX_DIFF = 2;
>> @@ -101,7 +100,7 @@ static bool ath9k_check_chirping(struct ath_softc *sc,
>> u8 *data,
>>  int datalen, bool is_ctl, bool is_ext)
>>   {
>> int i;
>> -   int max_bin[FFT_NUM_SAMPLES];
>> +   int max_bin[NUM_DIFFS + 1];
>
>
> Just wondering. Is this actually a VLA. FFT_NUM_SAMPLES was static const so
> not really going to show a lot of variation. This array will always have the
> same size on the stack.

The problem is that it's not a "constant expression", so the compiler
frontend still yells about it under -Wvla. I would characterize this
mainly as a fix for "accidental VLA" or "misdetected VLA" or something
like that. AIUI, there really isn't a functional change here.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v2] net: ipv6: xfrm6_state: remove VLA usage

2018-03-10 Thread Kees Cook

On Sat, Mar 10, 2018 at 12:43 AM, Stefano Brivio <sbri...@redhat.com> wrote:
> On Sat, 10 Mar 2018 09:40:44 +0200
> Andreas Christoforou <andreaschrist...@gmail.com> wrote:
>
>> diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
>> index b15075a..270a53a 100644
>> --- a/net/ipv6/xfrm6_state.c
>> +++ b/net/ipv6/xfrm6_state.c
>> @@ -62,7 +62,7 @@ __xfrm6_sort(void **dst, void **src, int n, int 
>> (*cmp)(void *p), int maxclass)
>>  {
>>   int i;
>>   int class[XFRM_MAX_DEPTH];
>> - int count[maxclass];
>> + int count[XFRM_MAX_DEPTH];
>>
>>   memset(count, 0, sizeof(count));
>
> Can you perhaps initialize 'count' instead of calling memset(), now?

Do you mean:

int count[XFRM_MAX_DEPTH] = { };

instead of the memset()?

I thought the compiler would resolve these both to the same thing? The
former looks better though! :)

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-10 Thread Kees Cook

On Fri, Mar 9, 2018 at 10:10 PM, Miguel Ojeda
<miguel.ojeda.sando...@gmail.com> wrote:
> On Sat, Mar 10, 2018 at 4:11 AM, Randy Dunlap <rdun...@infradead.org> wrote:
>> On 03/09/2018 04:07 PM, Andrew Morton wrote:
>>> On Fri, 9 Mar 2018 12:05:36 -0800 Kees Cook <keesc...@chromium.org> wrote:
>>>
>>>> When max() is used in stack array size calculations from literal values
>>>> (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
>>>> thinks this is a dynamic calculation due to the single-eval logic, which
>>>> is not needed in the literal case. This change removes several accidental
>>>> stack VLAs from an x86 allmodconfig build:
>>>>
>>>> $ diff -u before.txt after.txt | grep ^-
>>>> -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
>>>> variable length array ‘ids’ [-Wvla]
>>>> -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length 
>>>> array ‘namebuf’ [-Wvla]
>>>> -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array 
>>>> ‘sym’ [-Wvla]
>>>> -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array 
>>>> ‘buff’ [-Wvla]
>>>> -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array 
>>>> ‘buff’ [-Wvla]
>>>> -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array 
>>>> ‘buff64’ [-Wvla]
>>>>
>>>> Based on an earlier patch from Josh Poimboeuf.
>>>
>>> v1, v2 and v3 of this patch all fail with gcc-4.4.4:
>>>
>>> ./include/linux/jiffies.h: In function 'jiffies_delta_to_clock_t':
>>> ./include/linux/jiffies.h:444: error: first argument to 
>>> '__builtin_choose_expr' not a constant
>>
>>
>> I'm seeing that problem with
>>> gcc --version
>> gcc (SUSE Linux) 4.8.5
>
> Same here, 4.8.5 fails. gcc 5.4.1 seems to work. I compiled a minimal
> 5.1.0 and it seems to work as well.

And sparse freaks out too:

   drivers/net/ethernet/via/via-velocity.c:97:26: sparse: incorrect
type in initializer (different address spaces) @@    expected void
*addr @@got struct mac_regs [noderef] *mac_regs
   drivers/net/ethernet/via/via-velocity.c:100:49: sparse: incorrect
type in argument 2 (different base types) @@expected restricted
pci_power_t [usertype] state @@got _t [usertype] state @@

Alright, I'm giving up on fixing max(). I'll go back to STACK_MAX() or
some other name for the simple macro. Bleh.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-09 Thread Kees Cook

On Fri, Mar 9, 2018 at 5:30 PM, Kees Cook <keesc...@chromium.org> wrote:
> --
> Kees Cook
> Pixel SecurityOn
> [...]

WTF, gmail just blasted HTML into my explicitly plain-text email?! Apologies...

-- 
Kees Cook
Pixel SecurityOn
Fri, Mar 9, 2018 at 5:30 PM, Kees Cook mailto:keesc...@chromium.org;
target="_blank">keesc...@chromium.org
wrote:On
Fri, Mar 9, 2018 at 4:38 PM, Linus Torvalds
mailto:torva...@linux-foundation.org;>torva...@linux-foundation.org
wrote:
 On Fri, Mar 9, 2018 at 4:32 PM, Andrew Morton mailto:a...@linux-foundation.org;>a...@linux-foundation.org
wrote:

 I wonder which gcc versions actually accept Kees's addition.

Ah, my old nemesis, gcc 4.4.4. *sob*

 Note that we already do have this pattern, as seen by:

 git grep -2 __builtin_choose_expr | grep -2
__builtin_constant_p

 which show drivers/pinctrl/intel/pinctrl-intel.h, so the pattern
 already exists current kernels and _works_ - it apparently just
 doesn't work in slightly more complicated cases.

Fun. Yeah, so all the PIN_GROUP() callers have either a literal or an
array name for that argument, so the argument to
__builtin_constant_p() isn't complex.

git grep '\bPIN_GROUP\b' | egrep -v '(1|2|3|true|false)\)'

 It's one reason why I wondered if simplifying the expression to have
 just that single __builtin_constant_p() might not end up working..

Yeah, it seems like it doesn't bail out as "false" for complex
expressions given to __builtin_constant_p().

If no magic solution, then which of these?

- go back to my original "multi-eval max only for constants" macro (meh)
- add gcc version checks around this and similarly for -Wvla in the
future (eww)
- raise gcc version (yikes)

-Kees

--
Kees Cook
Pixel Securitydiv class="gmail_extra"brdiv
class="gmail_quote"On
Fri, Mar 9, 2018 at 4:38 PM, Linus Torvalds span
dir="ltr"lt;a
href="mailto:mailto:torva...@linux-foundation.org;>torvalds@linux-foundation.org"
target="_blank"mailto:torva...@linux-foundation.org;>torvalds@linux-foundation.org/agt;/span
wrote:brblockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"span
class=""On
Fri, Mar 9, 2018 at 4:32 PM, Andrew Morton lt;a
href="mailto:mailto:a...@linux-foundation.org;>akpm@linux-foundation.org"mailto:a...@linux-foundation.org;>akpm@linux-foundation.org/agt;
wrote:br
gt;br
gt; I wonder which gcc versions actually accept Kees's
addition.br
br
/spanNote that we already do have this pattern, as seen
by:br
br
nbsp; git grep -2nbsp; __builtin_choose_expr | grep -2
__builtin_constant_pbr
br
which show drivers/pinctrl/intel/pinctrl-wbrintel.h, so
the patternbr
already exists current kernels and _works_ - it apparently justbr
doesn't work in slightly more complicated cases.br
br
It's one reason why I wondered if simplifying the expression to
havebr
just that single __builtin_constant_p() might not end up working..br
span
class="HOEnZb"font color="#88"br
nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
nbsp; Linusbr
/font/span/blockquote/divbrbr
clear="all"divbr/div--
brdiv class="gmail_signature"
data-smartmail="gmail_signature"Kees
CookbrPixel Security/div
/div
--
Kees
CookPixel Security

Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-09 Thread Kees Cook

On Fri, Mar 9, 2018 at 4:38 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Fri, Mar 9, 2018 at 4:32 PM, Andrew Morton <a...@linux-foundation.org> 
> wrote:
>>
>> I wonder which gcc versions actually accept Kees's addition.

Ah, my old nemesis, gcc 4.4.4. *sob*

> Note that we already do have this pattern, as seen by:
>
>   git grep -2  __builtin_choose_expr | grep -2 __builtin_constant_p
>
> which show drivers/pinctrl/intel/pinctrl-intel.h, so the pattern
> already exists current kernels and _works_ - it apparently just
> doesn't work in slightly more complicated cases.

Fun. Yeah, so all the PIN_GROUP() callers have either a literal or an
array name for that argument, so the argument to
__builtin_constant_p() isn't complex.

git grep '\bPIN_GROUP\b' | egrep -v '(1|2|3|true|false)\)'

> It's one reason why I wondered if simplifying the expression to have
> just that single __builtin_constant_p() might not end up working..

Yeah, it seems like it doesn't bail out as "false" for complex
expressions given to __builtin_constant_p().

If no magic solution, then which of these?

- go back to my original "multi-eval max only for constants" macro (meh)
- add gcc version checks around this and similarly for -Wvla in the future (eww)
- raise gcc version (yikes)

-Kees

-- 
Kees Cook
Pixel SecurityOn
Fri, Mar 9, 2018 at 4:38 PM, Linus Torvalds mailto:torva...@linux-foundation.org;
target="_blank">torva...@linux-foundation.org
wrote:On
Fri, Mar 9, 2018 at 4:32 PM, Andrew Morton mailto:a...@linux-foundation.org;>a...@linux-foundation.org
wrote:

 I wonder which gcc versions actually accept Kees's addition.

Note that we already do have this pattern, as seen by:

 git grep -2 __builtin_choose_expr | grep -2
__builtin_constant_p

which show drivers/pinctrl/intel/pinctrl-intel.h, so the pattern
already exists current kernels and _works_ - it apparently just
doesn't work in slightly more complicated cases.

It's one reason why I wondered if simplifying the expression to have
just that single __builtin_constant_p() might not end up working..

   Linus
--
Kees
CookPixel Security

Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-09 Thread Kees Cook

On Fri, Mar 9, 2018 at 1:10 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Fri, Mar 9, 2018 at 12:05 PM, Kees Cook <keesc...@chromium.org> wrote:
>> When max() is used in stack array size calculations from literal values
>> (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
>> thinks this is a dynamic calculation due to the single-eval logic, which
>> is not needed in the literal case. This change removes several accidental
>> stack VLAs from an x86 allmodconfig build:
>
> Ok, looks good.
>
> I just have a couple of questions about applying it.
>
> In particular, if this will help people working on getting rid of
> VLA's in the short term, I can apply it directly.

AFAIK, all the VLAs effected by max() are fixed with this patch. i.e.
no other VLA work I'm aware of depends on this (famous last words).

> But if people who are looking at it (anybody else than Kees?)

FWIW, I've seen at least 6 other people helping out now with VLA clean
up patches. Whee. :)

> don't much care, then this
> might be a 4.17 thing or at least "random -mm queue"?

Andrew has this (v2) queued in -mm for 4.17. I didn't view VLA clean
up (or this macro change) as urgent by any means.

> #define __single_eval_max(max1, max2, x, y) ({  \
> typeof (x) max1 = (x);  \
> typeof (y) max2 = (y);  \
> max1 > max2 ? max1 : max2; })
>
> actually looks more natural to me than passing the two types in as
> arguments to the macro.

I (obviously) didn't design this macro originally, but my take on this
was that it's safer to keep the type check together with the
comparison so that it is never possible for someone to run off and use
__single_eval_max() directly and accidentally bypass the type check.
While it does look silly from the max_t()/min_t() perspective, it just
seems more defensive.

> (That form also is amenable to things like "__auto_type" etc simplifications).

Agreed, I think that's the biggest reason to lift the types. However,
since we're still not actually doing anything with the types (i.e.
this change doesn't weaken the type checking), I would think it would
be orthogonal. While it would be nice to resolve differing types
sanely, there doesn't appear to be a driving reason to make this
change. The example case of max(5, sizeof(thing)) doesn't currently
exist in the code and I don't see how making min()/max() _less_ strict
would be generically useful (in fact, it has proven useful to have
strict type checking).

> Side note: do we *really* need the unique variable names? That's what
> makes those things _really_ illegible. I thgink it's done just for a
> sparse warning that we should probably ignore. It's off by default
> anyway exactly because it doesn't work that well due to nested macro
> expansions like this.

That I'm not sure about. I'd be fine to remove them; I left them in
place because it seemed quite intentional. :)

> There is very real value to keeping our odd macros legible, I feel.
> Even when they are complicated by issues like this, it would be good
> to keep them as simple as possible.
>
> Comments?

I'm open to whatever! I'm just glad this gets to kill a handful of
"accidental" stack VLAs. :)

-Kees

-- 
Kees Cook
Pixel SecurityOn
Fri, Mar 9, 2018 at 1:10 PM, Linus Torvalds mailto:torva...@linux-foundation.org;
target="_blank">torva...@linux-foundation.org
wrote:On
Fri, Mar 9, 2018 at 12:05 PM, Kees Cook mailto:keesc...@chromium.org;>keesc...@chromium.org
wrote:
 When max() is used in stack array size calculations from literal values
 (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
 thinks this is a dynamic calculation due to the single-eval
logic, which
 is not needed in the literal case. This change removes several
accidental
 stack VLAs from an x86 allmodconfig build:

Ok, looks good.

I just have a couple of questions about applying it.

In particular, if this will help people working on getting rid of
VLA's in the short term, I can apply it directly. But if people who
are looking at it (anybody else than Kees?) don't much care, then this
might be a 4.17 thing or at least "random -mm queue"?

The other unrelated reaction I had to this was that "we're passing
those types down very deep, even though nobody _cares_ about them all
that much at that deep level".

Honestly, the only case that really cares is the very top level, and
the rest could just take the properly cast versions.

For example, "max_t/min_t" really don't care at all, since they - by
definition - just take the single specified type.

So I'm wondering if we should just drop the types from __max/__min
(and everything they call) entirely, and instead do

  #define __check_type(x,y)
((void)((typeof(x)*)1==(typeof(y)

[PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-09 Thread Kees Cook

When max() is used in stack array size calculations from literal values
(e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
thinks this is a dynamic calculation due to the single-eval logic, which
is not needed in the literal case. This change removes several accidental
stack VLAs from an x86 allmodconfig build:

$ diff -u before.txt after.txt | grep ^-
-drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
variable length array ‘ids’ [-Wvla]
-fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array 
‘namebuf’ [-Wvla]
-lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ 
[-Wvla]
-net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ 
[-Wvla]

Based on an earlier patch from Josh Poimboeuf.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
v3:
- drop __builtin_types_compatible_p() (Rasmus, Linus)
v2:
- fix copy/paste-o max1_/max2_ (ijc)
- clarify "compile-time" constant in comment (Rasmus)
- clean up formatting on min_t()/max_t()
---
 include/linux/kernel.h | 48 ++--
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..a0fca4deb3ab 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -787,37 +787,55 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * strict type-checking.. See the
  * "unnecessary" pointer comparison.
  */
-#define __min(t1, t2, min1, min2, x, y) ({ \
+#define __single_eval_min(t1, t2, min1, min2, x, y) ({ \
t1 min1 = (x);  \
t2 min2 = (y);  \
(void) ( == );\
min1 < min2 ? min1 : min2; })
 
+/*
+ * In the case of compile-time constant values, there is no need to do
+ * the double-evaluation protection, so the raw comparison can be made.
+ * This allows min()/max() to be used in stack array allocations and
+ * avoid the compiler thinking it is a dynamic value leading to an
+ * accidental VLA.
+ */
+#define __min(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y),  \
+ (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_min(t1, t2, \
+   __UNIQUE_ID(min1_), \
+   __UNIQUE_ID(min2_), \
+   x, y))
+
 /**
  * min - return minimum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define min(x, y)  \
-   __min(typeof(x), typeof(y), \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min(x, y)  __min(typeof(x), typeof(y), x, y)
 
-#define __max(t1, t2, max1, max2, x, y) ({ \
+#define __single_eval_max(t1, t2, max1, max2, x, y) ({ \
t1 max1 = (x);  \
t2 max2 = (y);  \
(void) ( == );\
max1 > max2 ? max1 : max2; })
 
+#define __max(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y),  \
+ (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_max(t1, t2, \
+   __UNIQUE_ID(max1_), \
+   __UNIQUE_ID(max2_), \
+   x, y))
 /**
  * max - return maximum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define max(x, y)  \
-   __max(typeof(x), typeof(y), \
- __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
- x, y)
+#define max(x, y)  __max(typeof(x), typeof(y), x, y)
 
 /**
  * min3 - return minimum of three values
@@ -869,10 +887,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * @x: first value
  * @y: second value
  */
-#define min_t(type, x, y)  \
-   __min(type, type,   \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
-

Re: [PATCH net-next] modules: allow modprobe load regular elf binaries

2018-03-09 Thread Kees Cook

On Fri, Mar 9, 2018 at 10:50 AM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Fri, Mar 9, 2018 at 10:43 AM, Kees Cook <keesc...@chromium.org> wrote:
>>
>> Module loading (via kernel_read_file()) already uses
>> deny_write_access(), and so does do_open_execat(). As long as module
>> loading doesn't call allow_write_access() before the execve() has
>> started in the new implementation, I think we'd be covered here.
>
> No. kernel_read_file() only does it *during* the read.

Ah, true. And looking at this again, shouldn't deny_write_access()
happen _before_ the LSM check in kernel_read_file()? That looks like a
problem...

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next] modules: allow modprobe load regular elf binaries

2018-03-09 Thread Kees Cook

On Fri, Mar 9, 2018 at 10:35 AM, David Miller <da...@davemloft.net> wrote:
> From: Linus Torvalds <torva...@linux-foundation.org>
> Date: Fri, 9 Mar 2018 10:17:42 -0800
>
>>  - use deny_write_access() to make sure that we don't have active
>> writers and cannot get them during the execve.
>
> I agree that this is necessary for image validation purposes.

Module loading (via kernel_read_file()) already uses
deny_write_access(), and so does do_open_execat(). As long as module
loading doesn't call allow_write_access() before the execve() has
started in the new implementation, I think we'd be covered here.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 5:35 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> I don't want to weaken the type enforcement, and I _thought_ you had
> done that __builtin_types_compatible_p() to keep it in place.

I thought so too (that originally came from Josh), but on removal, I
was surprised that the checking was retained. :)

> But if that's not why you did it, then why was it there at all? If the
> type warning shows through even if it's in the other expression, then
> just a
>
>
> #define __min(t1, t2, x, y) \
> __builtin_choose_expr(  \
> __builtin_constant_p(x) &   \
> __builtin_constant_p(y),\
> (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),  \
> __single_eval_min(t1, t2,   \
>...
>
> would seem to be sufficient?
>
> Because logically, the only thing that matters is that x and y don't
> have any side effects and can be evaluated twice, and
> "__builtin_constant_p()" is already a much stronger version of that.
>
> Hmm? The __builtin_types_compatible_p() just doesn't seem to matter
> for the only thing I thought it was there for.

Yup, agreed. I'll drop it.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next] modules: allow modprobe load regular elf binaries

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 5:38 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Thu, Mar 8, 2018 at 4:59 PM, Andy Lutomirski <l...@amacapital.net> wrote:
>>
>> Also, I don't see how this is any more exploitable than any other
>> init_module().
>
> Absolutely. If Kees doesn't trust the files to be loaded, an
> executable - even if it's running with root privileges and in the
> initns - is still fundamentally weaker than a kernel module.
>
> So I don't understand the security argument AT ALL. It's nonsensical.
> The executable loading does all the same security checks that the
> module loading does, including the signing check.
>
> And the whole point is that we can now do things with building and
> loading a ebpf rule instead of having a full module.

My concerns are mostly about crossing namespaces. If a container
triggers an autoload, the result runs in the init_ns. So, really,
there's nothing new from my perspective, except that it's in userspace
instead of in the kernel.

Perhaps it's an orthogonal concern.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next] modules: allow modprobe load regular elf binaries

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 4:57 PM, Alexei Starovoitov <a...@fb.com> wrote:
> The above are three paragraphs of security paranoia without single
> concrete example of a security issue.

How is running an arbitrary ELF as full init_ns root from a container
not a concrete example?

I'm not saying this approach can never happen. And this isn't
paranoia. There are very real security boundary violations in this
model, IMO. Though, as Andy says, it's more about autoloading than umh
itself. I just don't want to extend that problem further.

>> As Andy asked earlier, why not DYN too to catch PIE executables? Seems
>> like forcing the userspace helper to be non-PIE would defeat some of
>> the userspace defenses in use in most distros.
>
>
> because we don't add features without concrete users.

It's just a two-line change, and then it would work on distros that
build PIE-by-default. That seems like a concrete use-case.

>> The exec.c changes should be split into a separate patch. Something
>> feels incorrectly refactored here, though. Passing all three of fd,
>> filename, and file to __do_execve_file() seems wrong; perhaps the fd
>> to file handling needs to happen externally in what you have here as
>> do_execveat_common()? The resulting __do_execve_file() has repeated
>> conditionals on filename... maybe what I object to is being able to
>> pass a NULL filename at all. The filename can be (painfully)
>> reconstructed from the struct file.
>
> reconstruct the path and instantly introduce the race between execing
> something by path vs prior check that it's actual elf of already
> opened file ?!
> excellent suggestion to improve security.

I'm not sure why you're being so hostile. We're both interested in
improving the kernel.

Path names aren't stable, but they provide good _debugging_ about
things when needed. For example, the LSM hooks, if they report on one
of these loads, can produce a hint to the user about what happened.
Passing "/dev/null" around is confusing at the very least. The ELF is
absolutely not /dev/null. Why make someone guess?

>>> [...]
>>> diff --git a/kernel/module.c b/kernel/module.c
>>> index ad2d420024f6..6cfa35795741 100644
>>> --- a/kernel/module.c
>>> +++ b/kernel/module.c
>>> [...]
>>> @@ -3870,14 +3896,21 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char
>>> __user *, uargs, int, flags)
>>>   |MODULE_INIT_IGNORE_VERMAGIC))
>>> return -EINVAL;
>>>
>>> -   err = kernel_read_file_from_fd(fd, , , INT_MAX,
>>> -  READING_MODULE);
>>> +   f = fdget(fd);
>>> +   if (!f.file)
>>> +   return -EBADF;
>>> +
>>> +   err = kernel_read_file(f.file, , , INT_MAX,
>>> READING_MODULE);
>>
>>
>> For the LSM subsystem, I think this should also get it's own enum
>> kernel_read_file_id. This is really no longer a kernel module...
>
>
> at this point it's a _file_. It could have been text file just well.
> If lsm is thinking that at this point kernel is processing
> kernel module that lsm is badly broken.

Again, this is about making things more discoverable. We already know
if we're loading a kernel module or a umh ELF. Passing this
information to the LSM is valuable to the LSM to distinguish between
types of files. They have very different effects on the system, for
example. The issue is about validating intent with target. "Is this
kernel module allowed?" and "Is this umh ELF allowed?" are better
questions that "Is something loaded through finit_module() allowed?"
Note already that the LSM distinguishes between many other file types
in kernel_read_file*().

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 3:48 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Thu, Mar 8, 2018 at 1:40 PM, Kees Cook <keesc...@chromium.org> wrote:
>> +#define __min(t1, t2, x, y)\
>> +   __builtin_choose_expr(__builtin_constant_p(x) &&\
>> + __builtin_constant_p(y) &&\
>> + __builtin_types_compatible_p(t1, t2), \
>> + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
>
> I understand why you use __builtin_types_compatible_p(), but please don't.
>
> It will mean that trivial constants like "5" and "sizeof(x)" won't
> simplify, because they have different types.

Rasmus mentioned this too. What I said there was that I was shy to
make that change, since we already can't mix that kind of thing with
the existing min()/max() implementation. The existing min()/max() is
already extremely strict, so there are no instances of this in the
tree. If I explicitly add one, I see this with or without the patch:

In file included from drivers/misc/lkdtm.h:7:0,
 from drivers/misc/lkdtm_core.c:33:
drivers/misc/lkdtm_core.c: In function ‘lkdtm_module_exit’:
./include/linux/kernel.h:809:16: warning: comparison of distinct
pointer types lacks a cast
  (void) ( == );   \
^
./include/linux/kernel.h:818:2: note: in expansion of macro ‘__max’
  __max(typeof(x), typeof(y),   \
  ^
./include/linux/printk.h:308:34: note: in expansion of macro ‘max’
  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
  ^~~
drivers/misc/lkdtm_core.c:500:2: note: in expansion of macro ‘pr_info’
  pr_info("%lu\n", max(16, sizeof(unsigned long)));
  ^~~

> The ?: will give the right combined type anyway, and if you want the
> type comparison warning, just add a comma-expression with something
> like like
>
>(t1 *)1 == (t2 *)1
>
> to get the type compatibility warning.

When I tried removing __builtin_types_compatible_p(), I still got the
type-check warning because I think the preprocessor still sees the
"(void) ( == )" before optimizing? So, I technically _can_
drop the __builtin_types_compatible_p(), and still keep the type
warning. :P

> Yeah, yeah, maybe none of the VLA cases triggered that, but it seems
> silly to not just get that obvious constant case right.
>
> Hmm?

So are you saying you _want_ the type enforcement weakened here, or
that I should just not use __builtin_types_compatible_p()?

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next] modules: allow modprobe load regular elf binaries

2018-03-08 Thread Kees Cook

*)sub_info->envp);
> +   if (sub_info->file)
> +   retval = do_execve_file(sub_info->file,
> +   sub_info->argv, sub_info->envp);
> +   else
> +   retval = do_execve(getname_kernel(sub_info->path),
> +  (const char __user *const __user 
> *)sub_info->argv,
> +  (const char __user *const __user 
> *)sub_info->envp);
>  out:
> sub_info->retval = retval;
> /*
> @@ -393,6 +397,20 @@ struct subprocess_info *call_usermodehelper_setup(const 
> char *path, char **argv,
>  }
>  EXPORT_SYMBOL(call_usermodehelper_setup);
>
> +struct subprocess_info *call_usermodehelper_setup_file(struct file *file)
> +{
> +   struct subprocess_info *sub_info;
> +
> +   sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL);
> +   if (!sub_info)
> +   return NULL;
> +
> +   INIT_WORK(_info->work, call_usermodehelper_exec_work);
> +   sub_info->path = "/dev/null";
> +   sub_info->file = file;

This use of "/dev/null" here and in execve is just wrong. It _does_
have a path and filename...

Anyway, interesting idea. I think it _can_ work, it just needs much
much more careful security boundaries and to solve our autoloading
exposures too.

-Kees

-- 
Kees Cook
Pixel Security

[PATCH v2] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

When max() is used in stack array size calculations from literal values
(e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
thinks this is a dynamic calculation due to the single-eval logic, which
is not needed in the literal case. This change removes several accidental
stack VLAs from an x86 allmodconfig build:

$ diff -u before.txt after.txt | grep ^-
-drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
variable length array ‘ids’ [-Wvla]
-fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array 
‘namebuf’ [-Wvla]
-lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ 
[-Wvla]
-net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ 
[-Wvla]

Based on an earlier patch from Josh Poimboeuf.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
v2:
- fix copy/paste-o max1_/max2_ (ijc)
- clarify "compile-time" constant in comment (Rasmus)
- clean up formatting on min_t()/max_t()
---
 include/linux/kernel.h | 50 --
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..108cdf7bd484 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -787,37 +787,57 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * strict type-checking.. See the
  * "unnecessary" pointer comparison.
  */
-#define __min(t1, t2, min1, min2, x, y) ({ \
+#define __single_eval_min(t1, t2, min1, min2, x, y) ({ \
t1 min1 = (x);  \
t2 min2 = (y);  \
(void) ( == );\
min1 < min2 ? min1 : min2; })
 
+/*
+ * In the case of compile-time constant values, there is no need to do
+ * the double-evaluation protection, so the raw comparison can be made.
+ * This allows min()/max() to be used in stack array allocations and
+ * avoid the compiler thinking it is a dynamic value leading to an
+ * accidental VLA.
+ */
+#define __min(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y) &&\
+ __builtin_types_compatible_p(t1, t2), \
+ (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_min(t1, t2, \
+   __UNIQUE_ID(min1_), \
+   __UNIQUE_ID(min2_), \
+   x, y))
+
 /**
  * min - return minimum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define min(x, y)  \
-   __min(typeof(x), typeof(y), \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min(x, y)  __min(typeof(x), typeof(y), x, y)
 
-#define __max(t1, t2, max1, max2, x, y) ({ \
+#define __single_eval_max(t1, t2, max1, max2, x, y) ({ \
t1 max1 = (x);  \
t2 max2 = (y);  \
(void) ( == );\
max1 > max2 ? max1 : max2; })
 
+#define __max(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y) &&\
+ __builtin_types_compatible_p(t1, t2), \
+ (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_max(t1, t2, \
+   __UNIQUE_ID(max1_), \
+   __UNIQUE_ID(max2_), \
+   x, y))
 /**
  * max - return maximum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define max(x, y)  \
-   __max(typeof(x), typeof(y), \
- __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
- x, y)
+#define max(x, y)  __max(typeof(x), typeof(y), x, y)
 
 /**
  * min3 - return minimum of three values
@@ -869,10 +889,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * @x: first value
  * @y: second value
  */
-#define min_t(type, x, y)  \
-   __min(type, type,

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 2:12 PM, Rasmus Villemoes
<li...@rasmusvillemoes.dk> wrote:
> On 8 March 2018 at 21:39, Kees Cook <keesc...@chromium.org> wrote:
>> However, this works for me:
>>
>> #define __new_max(t1, t2, max1, max2, x, y)\
>>__builtin_choose_expr(__builtin_constant_p(x) && \
>>  __builtin_constant_p(y) && \
>>  __builtin_types_compatible_p(t1, t2), \
>>  (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
>>  __max(t1, t2, max1, max2, x, y))
>>
>> #define new_max(x, y) \
>> __new_max(typeof(x), typeof(y), \
>>   __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
>>   x, y)
>
> Yes, that would seem to do the trick.
>
> Thinking out loud: do we really want or need the
> __builtin_types_compatible condition when x and y are compile-time
> constants? I think it would be nice to be able to use max(16,
> sizeof(bla)) without having to cast either the literal or the sizeof.
> Just omitting the type compatibility check might be dangerous, but
> perhaps it could be relaxed to a check that both values are
> representable in their common promoted type. Something like
>
> (type_signed(t1) == type_signed(t2)) || ((t1)x >= 0 && (t2)y >= 0)
>
> should be safe (if the types have same signedness, or the value of
> signed type is positive), though it doesn't allow a few corner cases
> (e.g. short vs. unsigned char is always ok due to promotion to int,
> and also if the signed type is strictly wider than the unsigned type).

I agree, it would be nice. However, I think it'd be better to continue
to depend on max_t() for these kinds of cases though. For example:

char foo[max_t(size_t, 6, sizeof(something))];

Works with the proposed patch.

Also, I think this mismatch would already be triggering warnings, so
we shouldn't have any currently.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 2:18 PM, Andrew Morton <a...@linux-foundation.org> wrote:
> On Thu, 8 Mar 2018 13:40:45 -0800 Kees Cook <keesc...@chromium.org> wrote:
>
>> When max() is used in stack array size calculations from literal values
>> (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
>> thinks this is a dynamic calculation due to the single-eval logic, which
>> is not needed in the literal case. This change removes several accidental
>> stack VLAs from an x86 allmodconfig build:
>>
>> $ diff -u before.txt after.txt | grep ^-
>> -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
>> variable length array ‘ids’ [-Wvla]
>> -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length 
>> array ‘namebuf’ [-Wvla]
>> -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ 
>> [-Wvla]
>> -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array 
>> ‘buff’ [-Wvla]
>> -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array 
>> ‘buff’ [-Wvla]
>> -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array 
>> ‘buff64’ [-Wvla]
>>
>> Based on an earlier patch from Josh Poimboeuf.
>>
>> ...
>>
>> --- a/include/linux/kernel.h
>> +++ b/include/linux/kernel.h
>> @@ -787,37 +787,57 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
>> oops_dump_mode) { }
>>   * strict type-checking.. See the
>>   * "unnecessary" pointer comparison.
>>   */
>> -#define __min(t1, t2, min1, min2, x, y) ({   \
>> +#define __single_eval_min(t1, t2, min1, min2, x, y) ({   \
>>   t1 min1 = (x);  \
>>   t2 min2 = (y);  \
>>   (void) ( == );\
>>   min1 < min2 ? min1 : min2; })
>>
>> +/*
>> + * In the case of builtin constant values, there is no need to do the
>> + * double-evaluation protection, so the raw comparison can be made.
>> + * This allows min()/max() to be used in stack array allocations and
>> + * avoid the compiler thinking it is a dynamic value leading to an
>> + * accidental VLA.
>> + */
>> +#define __min(t1, t2, x, y)  \
>> + __builtin_choose_expr(__builtin_constant_p(x) &&\
>> +   __builtin_constant_p(y) &&\
>> +   __builtin_types_compatible_p(t1, t2), \
>> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
>> +   __single_eval_min(t1, t2, \
>> + __UNIQUE_ID(max1_), \
>> + __UNIQUE_ID(max2_), \
>> + x, y))
>> +
>
> Holy crap.
>
> I suppose gcc will one day be fixed and we won't need this.
>
> Is there a good reason to convert min()?  Surely nobody will be using
> min to dimension an array - always max?  Just for symmetry, I guess.

I just went with symmetry. It seems like an ugly risk to implement min
and mix differently. :) In theory it may produce smaller code for rare
min() uses, but I haven't actually verified that.

I will send a v2 with the two nits mentioned...

-Kees

-- 
Kees Cook
Pixel Security

[PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

When max() is used in stack array size calculations from literal values
(e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
thinks this is a dynamic calculation due to the single-eval logic, which
is not needed in the literal case. This change removes several accidental
stack VLAs from an x86 allmodconfig build:

$ diff -u before.txt after.txt | grep ^-
-drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
variable length array ‘ids’ [-Wvla]
-fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array 
‘namebuf’ [-Wvla]
-lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ 
[-Wvla]
-net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ 
[-Wvla]

Based on an earlier patch from Josh Poimboeuf.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/kernel.h | 42 ++
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..e0b39d461582 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -787,37 +787,57 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * strict type-checking.. See the
  * "unnecessary" pointer comparison.
  */
-#define __min(t1, t2, min1, min2, x, y) ({ \
+#define __single_eval_min(t1, t2, min1, min2, x, y) ({ \
t1 min1 = (x);  \
t2 min2 = (y);  \
(void) ( == );\
min1 < min2 ? min1 : min2; })
 
+/*
+ * In the case of builtin constant values, there is no need to do the
+ * double-evaluation protection, so the raw comparison can be made.
+ * This allows min()/max() to be used in stack array allocations and
+ * avoid the compiler thinking it is a dynamic value leading to an
+ * accidental VLA.
+ */
+#define __min(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y) &&\
+ __builtin_types_compatible_p(t1, t2), \
+ (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_min(t1, t2, \
+   __UNIQUE_ID(max1_), \
+   __UNIQUE_ID(max2_), \
+   x, y))
+
 /**
  * min - return minimum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define min(x, y)  \
-   __min(typeof(x), typeof(y), \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min(x, y)  __min(typeof(x), typeof(y), x, y)
 
-#define __max(t1, t2, max1, max2, x, y) ({ \
+#define __single_eval_max(t1, t2, max1, max2, x, y) ({ \
t1 max1 = (x);  \
t2 max2 = (y);  \
(void) ( == );\
max1 > max2 ? max1 : max2; })
 
+#define __max(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y) &&\
+ __builtin_types_compatible_p(t1, t2), \
+ (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_max(t1, t2, \
+   __UNIQUE_ID(max1_), \
+   __UNIQUE_ID(max2_), \
+   x, y))
 /**
  * max - return maximum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define max(x, y)  \
-   __max(typeof(x), typeof(y), \
- __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
- x, y)
+#define max(x, y)  __max(typeof(x), typeof(y), x, y)
 
 /**
  * min3 - return minimum of three values
@@ -871,7 +891,6 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  */
 #define min_t(type, x, y)  \
__min(type, type,   \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
  x, y)
 
 /**
@@ -882,7 +901,6 @@ static inline void ftrace_dump(enu

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 11:57 AM, Rasmus Villemoes
<li...@rasmusvillemoes.dk> wrote:
> On 2018-03-08 16:02, Josh Poimboeuf wrote:
>> On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
>>> This series adds SIMPLE_MAX() to be used in places where a stack array
>>> is actually fixed, but the compiler still warns about VLA usage due to
>>> confusion caused by the safety checks in the max() macro.
>>>
>>> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
>>> and they should all have no operational differences.
>>
>> What if we instead simplify the max() macro's type checking so that GCC
>> can more easily fold the array size constants?  The below patch seems to
>> work:
>>
>
>> +extern long __error_incompatible_types_in_min_macro;
>> +extern long __error_incompatible_types_in_max_macro;
>> +
>> +#define __min(t1, t2, x, y)  \
>> + __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
>> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
>> +   (t1)__error_incompatible_types_in_min_macro)
>>
>>  /**
>>   * min - return minimum of two values of the same or compatible types
>>   * @x: first value
>>   * @y: second value
>>   */
>> -#define min(x, y)\
>> - __min(typeof(x), typeof(y), \
>> -   __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
>> -   x, y)
>> +#define min(x, y) __min(typeof(x), typeof(y), x, y)  \
>>
>
> But this introduces the the-chosen-one-of-x-and-y-gets-evaluated-twice
> problem. Maybe we don't care? But until we get a
> __builtin_assert_this_has_no_side_effects() I think that's a little
> dangerous.

Eek, yes, we can't do the double-eval. The proposed change breaks
things badly. :)

a:   20
b:   40
max(a++, b++): 40
a:   21
b:   41

a:   20
b:   40
new_max(a++, b++): 41
a:   21
b:   42

However, this works for me:

#define __new_max(t1, t2, max1, max2, x, y)\
   __builtin_choose_expr(__builtin_constant_p(x) && \
 __builtin_constant_p(y) && \
 __builtin_types_compatible_p(t1, t2), \
 (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
 __max(t1, t2, max1, max2, x, y))

#define new_max(x, y) \
__new_max(typeof(x), typeof(y), \
  __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
  x, y)

(pardon the whitespace damage...)

Let me spin a sane patch and test it...

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 7:02 AM, Josh Poimboeuf <jpoim...@redhat.com> wrote:
> On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
>> This series adds SIMPLE_MAX() to be used in places where a stack array
>> is actually fixed, but the compiler still warns about VLA usage due to
>> confusion caused by the safety checks in the max() macro.
>>
>> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
>> and they should all have no operational differences.
>
> What if we instead simplify the max() macro's type checking so that GCC
> can more easily fold the array size constants?  The below patch seems to
> work:

Oh magic! Very nice. I couldn't figure out how to do this when I
stared at it. Yes, let me respin. (I assume I can add your S-o-b?)

-Kees

>
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 3fd291503576..ec863726da29 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -782,42 +782,32 @@ ftrace_vprintk(const char *fmt, va_list ap)
>  static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
>  #endif /* CONFIG_TRACING */
>
> -/*
> - * min()/max()/clamp() macros that also do
> - * strict type-checking.. See the
> - * "unnecessary" pointer comparison.
> - */
> -#define __min(t1, t2, min1, min2, x, y) ({ \
> -   t1 min1 = (x);  \
> -   t2 min2 = (y);  \
> -   (void) ( == );\
> -   min1 < min2 ? min1 : min2; })
> +extern long __error_incompatible_types_in_min_macro;
> +extern long __error_incompatible_types_in_max_macro;
> +
> +#define __min(t1, t2, x, y)\
> +   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
> + (t1)__error_incompatible_types_in_min_macro)
>
>  /**
>   * min - return minimum of two values of the same or compatible types
>   * @x: first value
>   * @y: second value
>   */
> -#define min(x, y)  \
> -   __min(typeof(x), typeof(y), \
> - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> - x, y)
> +#define min(x, y) __min(typeof(x), typeof(y), x, y)\
>
> -#define __max(t1, t2, max1, max2, x, y) ({ \
> -   t1 max1 = (x);  \
> -   t2 max2 = (y);  \
> -   (void) ( == );\
> -   max1 > max2 ? max1 : max2; })
> +#define __max(t1, t2, x, y)\
> +   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> + (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
> + (t1)__error_incompatible_types_in_max_macro)
>
>  /**
>   * max - return maximum of two values of the same or compatible types
>   * @x: first value
>   * @y: second value
>   */
> -#define max(x, y)  \
> -   __max(typeof(x), typeof(y), \
> - __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
> - x, y)
> +#define max(x, y) __max(typeof(x), typeof(y), x, y)
>
>  /**
>   * min3 - return minimum of three values
> @@ -869,10 +859,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> oops_dump_mode) { }
>   * @x: first value
>   * @y: second value
>   */
> -#define min_t(type, x, y)  \
> -   __min(type, type,   \
> - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> - x, y)
> +#define min_t(type, x, y) __min(type, type, x, y)
>
>  /**
>   * max_t - return maximum of two values, using the specified type
> @@ -880,10 +867,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> oops_dump_mode) { }
>   * @x: first value
>   * @y: second value
>   */
> -#define max_t(type, x, y)  \
> -   __max(type, type,   \
> - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> - x, y)
> +#define max_t(type, x, y) __max(type, type, x, y)
>   \
>
>  /**
>   * clamp_t - return a value clamped to a given range using a given type



-- 
Kees Cook
Pixel Security

Please add "NFC: llcp: Limit size of SDP URI" for stable

2018-03-07 Thread Kees Cook

Hi,

I don't see fe9c842695e2 ("NFC: llcp: Limit size of SDP URI") queued
up for stable. Can this one be added please? This is a buffer overflow
fix.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

[PATCH 2/3] net: Remove accidental VLAs from proc buffers

2018-03-07 Thread Kees Cook

In the quest to remove all stack VLAs from the kernel[1], this refactors
the stack array size calculation to avoid using max(), which makes the
compiler think the size isn't fixed.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 net/ipv4/proc.c | 10 --
 net/ipv6/proc.c | 10 --
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index dc5edc8f7564..c23c43803435 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -46,8 +46,6 @@
 #include 
 #include 
 
-#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX)
-
 /*
  * Report socket allocation statistics [m...@utu.fi]
  */
@@ -400,11 +398,11 @@ static int snmp_seq_show_ipstats(struct seq_file *seq, 
void *v)
 
 static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
 {
-   unsigned long buff[TCPUDP_MIB_MAX];
+   unsigned long buff[SIMPLE_MAX(UDP_MIB_MAX, TCP_MIB_MAX)];
struct net *net = seq->private;
int i;
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
seq_puts(seq, "\nTcp:");
for (i = 0; snmp4_tcp_list[i].name; i++)
@@ -421,7 +419,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
seq_printf(seq, " %lu", buff[i]);
}
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
snmp_get_cpu_field_batch(buff, snmp4_udp_list,
 net->mib.udp_statistics);
@@ -432,7 +430,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %lu", buff[i]);
 
-   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+   memset(buff, 0, sizeof(buff));
 
/* the UDP and UDP-Lite MIBs are the same */
seq_puts(seq, "\nUdpLite:");
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index b67814242f78..5b0874c26802 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -30,10 +30,8 @@
 #include 
 #include 
 
-#define MAX4(a, b, c, d) \
-   max_t(u32, max_t(u32, a, b), max_t(u32, c, d))
-#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \
-   IPSTATS_MIB_MAX, ICMP_MIB_MAX)
+#define SNMP_MIB_MAX SIMPLE_MAX(SIMPLE_MAX(UDP_MIB_MAX, TCP_MIB_MAX), \
+   SIMPLE_MAX(IPSTATS_MIB_MAX, ICMP_MIB_MAX))
 
 static int sockstat6_seq_show(struct seq_file *seq, void *v)
 {
@@ -199,7 +197,7 @@ static void snmp6_seq_show_item(struct seq_file *seq, void 
__percpu *pcpumib,
int i;
 
if (pcpumib) {
-   memset(buff, 0, sizeof(unsigned long) * SNMP_MIB_MAX);
+   memset(buff, 0, sizeof(buff));
 
snmp_get_cpu_field_batch(buff, itemlist, pcpumib);
for (i = 0; itemlist[i].name; i++)
@@ -218,7 +216,7 @@ static void snmp6_seq_show_item64(struct seq_file *seq, 
void __percpu *mib,
u64 buff64[SNMP_MIB_MAX];
int i;
 
-   memset(buff64, 0, sizeof(u64) * SNMP_MIB_MAX);
+   memset(buff64, 0, sizeof(buff64));
 
snmp_get_cpu_field64_batch(buff64, itemlist, mib, syncpoff);
for (i = 0; itemlist[i].name; i++)
-- 
2.7.4

[PATCH 3/3] btrfs: tree-checker: Avoid accidental stack VLA

2018-03-07 Thread Kees Cook

In the quest to remove all stack VLAs from the kernel[1], this refactors
the stack array size calculation to avoid using max(), which makes the
compiler think the size isn't fixed.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 fs/btrfs/tree-checker.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c3c8d48f6618..59bd07694118 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -341,7 +341,8 @@ static int check_dir_item(struct btrfs_root *root,
 */
if (key->type == BTRFS_DIR_ITEM_KEY ||
key->type == BTRFS_XATTR_ITEM_KEY) {
-   char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)];
+   char namebuf[SIMPLE_MAX(BTRFS_NAME_LEN,
+   XATTR_NAME_MAX)];
 
read_extent_buffer(leaf, namebuf,
(unsigned long)(di + 1), name_len);
-- 
2.7.4

[PATCH 0/3] Remove accidental VLA usage

2018-03-07 Thread Kees Cook

This series adds SIMPLE_MAX() to be used in places where a stack array
is actually fixed, but the compiler still warns about VLA usage due to
confusion caused by the safety checks in the max() macro.

I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
and they should all have no operational differences.

-Kees

[PATCH v2 1/3] vsprintf: Remove accidental VLA usage

2018-03-07 Thread Kees Cook

In the quest to remove all stack VLAs from the kernel[1], this introduces
a new "simple max" macro, and changes the "sym" array size calculation to
use it. The value is actually a fixed size, but since the max() macro uses
some extensive tricks for safety, it ends up looking like a variable size
to the compiler.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/kernel.h | 11 +++
 lib/vsprintf.c |  4 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..1da554e9997f 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -820,6 +820,17 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  x, y)
 
 /**
+ * SIMPLE_MAX - return maximum of two values without any type checking
+ * @x: first value
+ * @y: second value
+ *
+ * This should only be used in stack array sizes, since the type-checking
+ * from max() confuses the compiler into thinking a VLA is being used.
+ */
+#define SIMPLE_MAX(x, y)   ((size_t)(x) > (size_t)(y) ? (size_t)(x) \
+  : (size_t)(y))
+
+/**
  * min3 - return minimum of three values
  * @x: first value
  * @y: second value
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index d7a708f82559..50cce36e1cdc 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -744,8 +744,8 @@ char *resource_string(char *buf, char *end, struct resource 
*res,
 #define FLAG_BUF_SIZE  (2 * sizeof(res->flags))
 #define DECODED_BUF_SIZE   sizeof("[mem - 64bit pref window disabled]")
 #define RAW_BUF_SIZE   sizeof("[mem - flags 0x]")
-   char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
-2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
+   char sym[SIMPLE_MAX(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
+   2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
 
char *p = sym, *pend = sym + sizeof(sym);
int decode = (fmt[0] == 'R') ? 1 : 0;
-- 
2.7.4

Re: Issue accessing task_struct from BPF due to 4.16 stack-protector changes

2018-03-02 Thread Kees Cook

On Fri, Mar 2, 2018 at 2:26 PM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Fri, Mar 02, 2018 at 02:04:17PM -0800, Gianluca Borello wrote:
>> On Fri, Mar 2, 2018 at 12:42 PM, Alexei Starovoitov
>> <alexei.starovoi...@gmail.com> wrote:
>> >
>> > good catch!
>> > I wonder why sched.h is using this flag insead of relying on #defines from 
>> > autoconf.h
>> > It could have been using CONFIG_HAVE_CC_STACKPROTECTOR
>> > instead of CONFIG_CC_STACKPROTECTOR, no ?
>> >
>>
>> Thanks for your reply Alexei. I think switching to
>> HAVE_CC_STACKPROTECTOR could indeed solve this particular BPF issue in
>> a cleaner way (I tested it), at the cost of having that struct member
>> always present for the supported architectures even if the stack
>> protector is actually disabled (e.g. CONFIG_CC_STACKPROTECTOR_NONE=y).
>
> if defined(HAVE_CC_STACKPROTECTOR) && !defined(CONFIG_CC_STACKPROTECTOR_NONE)

CONFIG_CC_STACKPROTECTOR_AUTO may result in no stack protector, so
CONFIG_CC_STACKPROTECTOR is the way to determine if it should exist.

> let's fix it properly instead of adding more hacks to Makefiles

It is being fixed properly -- the detection code is being moved out of
Makefile into Kconfig, at which point this won't be as weird as it is.

If KBUILD_CPPFLAGS won't work for you, I'm not hugely opposed to
switching the task_struct ifdef to HAVE_CC_STACKPROTECTOR, since it is
extremely rare to build without stack protector on architectures that
support it.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH 09/58] net/irda: Convert timers to use timer_setup()

2018-03-02 Thread Kees Cook

On Fri, Mar 2, 2018 at 1:29 PM, Marcelo Ricardo Leitner
<marcelo.leit...@gmail.com> wrote:
> Note how it is using the irda_start_timer definition from
> include/net/irda/timer.h instead of
> drivers/staging/irda/include/net/irda/timer.h which was patched in
> this patch.

$ git show net-next/master:include/net/irda/iriap.h
fatal: Path 'include/net/irda/iriap.h' does not exist in 'net-next/master'

4.14 moved include/net/irda/iriap.h into the staging directory:

5bf916ee0ab6 ("irda: move include/net/irda into staging subdirectory")

I think you've got a stale copy of the old file in your tree...

-Kees

-- 
Kees Cook
Pixel Security

Re: Issue accessing task_struct from BPF due to 4.16 stack-protector changes

2018-03-02 Thread Kees Cook

On Fri, Mar 2, 2018 at 2:04 PM, Gianluca Borello <g.bore...@gmail.com> wrote:
> On Fri, Mar 2, 2018 at 12:42 PM, Alexei Starovoitov
> <alexei.starovoi...@gmail.com> wrote:
>>
>> good catch!
>> I wonder why sched.h is using this flag insead of relying on #defines from 
>> autoconf.h
>> It could have been using CONFIG_HAVE_CC_STACKPROTECTOR
>> instead of CONFIG_CC_STACKPROTECTOR, no ?
>>
>
> Thanks for your reply Alexei. I think switching to
> HAVE_CC_STACKPROTECTOR could indeed solve this particular BPF issue in
> a cleaner way (I tested it), at the cost of having that struct member
> always present for the supported architectures even if the stack
> protector is actually disabled (e.g. CONFIG_CC_STACKPROTECTOR_NONE=y).
>
> Not sure if this could be frowned upon by someone considering how
> critical task_struct is, but on the other hand is really just 8 bytes.

That structure is huge, and I think it's proper to leave this as is.

Adding KBUILD_CPPFLAGS (for now) seems like the right way to go;
though in the future stack protector will be changed around again (to
be purely Kconfig again). There are a number of issues with its logic
in detecting and enabling, and another draft at solving it is under
development.

-Kees

-- 
Kees Cook
Pixel Security

Re: [net-next v3 0/2] eBPF seccomp filters

2018-02-27 Thread Kees Cook

On Tue, Feb 27, 2018 at 8:59 AM, chris hyser <chris.hy...@oracle.com> wrote:
> On 02/27/2018 11:00 AM, Kees Cook wrote:
>>
>> On Tue, Feb 27, 2018 at 6:53 AM, chris hyser <chris.hy...@oracle.com>
>> wrote:
>>>
>>> On 02/26/2018 11:38 PM, Kees Cook wrote:
>>>>
>>>>
>>>> On Mon, Feb 26, 2018 at 8:19 PM, Andy Lutomirski <l...@amacapital.net>
>>>> wrote:
>>>>>
>>>>>
>>>>> 3. Straight-up bugs.  Those are exactly as problematic as verifier
>>>>> bugs in any other unprivileged eBPF program type, right?  I don't see
>>>>> why seccomp is special here.
>>>>
>>>>
>>>>
>>>> My concern is more about unintended design mistakes or other feature
>>>> creep with side-effects, especially when it comes to privileges and
>>>> synchronization. Getting no-new-privs done correctly, for example,
>>>> took some careful thought and discussion, and I'm shy from how painful
>>>> TSYNC was on the process locking side, and eBPF has had some rather
>>>> ugly flaws in the past (and recently: it was nice to be able to say
>>>> for Spectre that seccomp filters couldn't be constructed to make
>>>> attacks but eBPF could). Adding the complexity needs to be worth the
>>>> gain. I'm on board for doing it, I just want to be careful. :)
>>>
>>>
>>>
>>>
>>> Another option might be to remove c/eBPF from the equation all together.
>>> c/eBPF allows flexibility and that almost always comes at the cost of
>>> additional security risk. Seccomp is for enhanced security yes? How about
>>> a
>>> new seccomp mode that passes in something like a bit vector or hashmap
>>> for
>>> "simple" white/black list checks validated by kernel code, versus user
>>> provided interpreted code? Of course this removes a fair number of things
>>> you can currently do or would be able to do with eBPF. Of course,
>>> restated
>>> from a security point of view, this removes a fair number of things an
>>> _attacker_ can do. Presumably the performance improvement would also be
>>> significant.
>>>
>>> Is this an idea worth prototyping?
>>
>>
>> That was the original prototype for seccomp-filter. :) The discussion
>> around that from years ago basically boiled down to it being
>> inflexible. Given all the things people want to do at syscall time,
>> that continues to be true. So true, in fact, that here we are now,
>> trying to move to eBPF from cBPF. ;)
>
>
> I will try to find that discussion. As someone pointed out here though, eBPF

A good starting point might be this:
https://lwn.net/Articles/441232/

-Kees

-- 
Kees Cook
Pixel Security

Re: [net-next v3 0/2] eBPF seccomp filters

2018-02-27 Thread Kees Cook

On Tue, Feb 27, 2018 at 6:53 AM, chris hyser <chris.hy...@oracle.com> wrote:
> On 02/26/2018 11:38 PM, Kees Cook wrote:
>>
>> On Mon, Feb 26, 2018 at 8:19 PM, Andy Lutomirski <l...@amacapital.net>
>> wrote:
>>>
>>> 3. Straight-up bugs.  Those are exactly as problematic as verifier
>>> bugs in any other unprivileged eBPF program type, right?  I don't see
>>> why seccomp is special here.
>>
>>
>> My concern is more about unintended design mistakes or other feature
>> creep with side-effects, especially when it comes to privileges and
>> synchronization. Getting no-new-privs done correctly, for example,
>> took some careful thought and discussion, and I'm shy from how painful
>> TSYNC was on the process locking side, and eBPF has had some rather
>> ugly flaws in the past (and recently: it was nice to be able to say
>> for Spectre that seccomp filters couldn't be constructed to make
>> attacks but eBPF could). Adding the complexity needs to be worth the
>> gain. I'm on board for doing it, I just want to be careful. :)
>
>
>
> Another option might be to remove c/eBPF from the equation all together.
> c/eBPF allows flexibility and that almost always comes at the cost of
> additional security risk. Seccomp is for enhanced security yes? How about a
> new seccomp mode that passes in something like a bit vector or hashmap for
> "simple" white/black list checks validated by kernel code, versus user
> provided interpreted code? Of course this removes a fair number of things
> you can currently do or would be able to do with eBPF. Of course, restated
> from a security point of view, this removes a fair number of things an
> _attacker_ can do. Presumably the performance improvement would also be
> significant.
>
> Is this an idea worth prototyping?

That was the original prototype for seccomp-filter. :) The discussion
around that from years ago basically boiled down to it being
inflexible. Given all the things people want to do at syscall time,
that continues to be true. So true, in fact, that here we are now,
trying to move to eBPF from cBPF. ;)

-Kees

-- 
Kees Cook
Pixel Security

Re: [net-next v3 0/2] eBPF seccomp filters

2018-02-26 Thread Kees Cook

On Mon, Feb 26, 2018 at 8:19 PM, Andy Lutomirski <l...@amacapital.net> wrote:
>> On Feb 26, 2018, at 3:20 PM, Kees Cook <keesc...@chromium.org> wrote:
>>
>> On Mon, Feb 26, 2018 at 3:04 PM, Alexei Starovoitov
>> <alexei.starovoi...@gmail.com> wrote:
>>>> On Mon, Feb 26, 2018 at 07:26:54AM +, Sargun Dhillon wrote:
>>>> This patchset enables seccomp filters to be written in eBPF. Although, this
>>>> [...]
>>> The main statement I want to hear from seccomp maintainers before
>>> proceeding any further on this that enabling eBPF in seccomp won't lead
>>> to seccomp folks arguing against changes in bpf core (like verifier)
>>> just because it's used by seccomp.
>>> It must be spelled out in the commit log with explicit Ack.
>>
>> The primary thing I'm concerned about with eBPF and seccomp is
>> side-effects from eBPF programs running at syscall time. This is an
>> extremely sensitive area, and I want to be sure there won't be
>> feature-creep here that leads to seccomp getting into a bad state.
>>
>> As long as seccomp can continue have its own verifier, I *think* this
>> will be fine, though, again I remain concerned about maps, etc. I'm
>> still reviewing these patches and how they might provide overlap with
>> Tycho's needs too, etc.
>
> I'm not sure I see this as a huge problem.  As far as I can see, there
> are three ways that a verifier change could be problematic:
>
> 1. Addition of a new type of map.  But seccomp would just not allow
> new map types by default, right?
>
> 2. Addition of a new BPF_CALLable helper.  Seccomp wants a way to
> whitelist BPF_CALL targets.  That should be straightforward.

Yup, agreed on 1 and 2.

> 3. Straight-up bugs.  Those are exactly as problematic as verifier
> bugs in any other unprivileged eBPF program type, right?  I don't see
> why seccomp is special here.

My concern is more about unintended design mistakes or other feature
creep with side-effects, especially when it comes to privileges and
synchronization. Getting no-new-privs done correctly, for example,
took some careful thought and discussion, and I'm shy from how painful
TSYNC was on the process locking side, and eBPF has had some rather
ugly flaws in the past (and recently: it was nice to be able to say
for Spectre that seccomp filters couldn't be constructed to make
attacks but eBPF could). Adding the complexity needs to be worth the
gain. I'm on board for doing it, I just want to be careful. :)

-Kees

-- 
Kees Cook
Pixel Security

Re: [net-next v3 1/2] bpf, seccomp: Add eBPF filter capabilities

2018-02-26 Thread Kees Cook

On Mon, Feb 26, 2018 at 8:08 PM, Sargun Dhillon <sar...@sargun.me> wrote:
> On Mon, Feb 26, 2018 at 7:57 PM, Tycho Andersen <ty...@tycho.ws> wrote:
>> On Mon, Feb 26, 2018 at 07:49:48PM -0800, Sargun Dhillon wrote:
>>> On Mon, Feb 26, 2018 at 4:54 PM, Tycho Andersen <ty...@tycho.ws> wrote:
>>> > On Mon, Feb 26, 2018 at 07:27:05AM +, Sargun Dhillon wrote:
>>> >> +config SECCOMP_FILTER_EXTENDED
>>> >> + bool "Extended BPF seccomp filters"
>>> >> + depends on SECCOMP_FILTER && BPF_SYSCALL
>>> >> + depends on !CHECKPOINT_RESTORE
>>> >
>>> > Why not just give -EINVAL or something in case one of these is
>>> > requested, instead of making them incompatible at compile time?
>>> >
>>> > Tycho
>>> There's already code to return -EMEDIUMTYPE if it's a non-classic, or
>>> non-saved filter. Under the normal case, with CHECKPOINT_RESTORE
>>> enabled, you should never be able to get that. I think it makes sense
>>> to preserve this behaviour.
>>
>> Oh, right. So can't we just drop this, and the existing code will
>> DTRT, i.e. give you -EMEDIUMTYPE because the new filters aren't
>> supported, until they are?
>>
>> Tycho
> My suggestion is we merge this as is, so we don't break checkpoint /
> restore, and I will try to get the filter dumping patching in the same
> development cycle as it comes at minimal risk. Otherwise, we risk
> introducing a feature which could break checkpoint/restore, even in
> unprivileged containers since anyone can load a BPF Seccomp filter.

There is no rush to merge such a drastic expansion of the seccomp
attack surface. :) For me, the driving feature is if we can get
Tycho's notifier implemented in eBPF. The speed improvements, as far
as I'm concerned, aren't sufficient to add eBPF to seccomp. They are
certainly a nice benefit, but seccomp must be very conservative about
adding attack surface.

-Kees

-- 
Kees Cook
Pixel Security

Re: [net-next v3 0/2] eBPF seccomp filters

2018-02-26 Thread Kees Cook

On Mon, Feb 26, 2018 at 3:04 PM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Mon, Feb 26, 2018 at 07:26:54AM +, Sargun Dhillon wrote:
>> This patchset enables seccomp filters to be written in eBPF. Although, this
>> [...]
> The main statement I want to hear from seccomp maintainers before
> proceeding any further on this that enabling eBPF in seccomp won't lead
> to seccomp folks arguing against changes in bpf core (like verifier)
> just because it's used by seccomp.
> It must be spelled out in the commit log with explicit Ack.

The primary thing I'm concerned about with eBPF and seccomp is
side-effects from eBPF programs running at syscall time. This is an
extremely sensitive area, and I want to be sure there won't be
feature-creep here that leads to seccomp getting into a bad state.

As long as seccomp can continue have its own verifier, I *think* this
will be fine, though, again I remain concerned about maps, etc. I'm
still reviewing these patches and how they might provide overlap with
Tycho's needs too, etc.

-Kees

-- 
Kees Cook
Pixel Security

nla_put_string() vs NLA_STRING

2018-02-20 Thread Kees Cook

Hi,

It seems that in at least one case[1], nla_put_string() is being used
on an NLA_STRING, which lacks a NULL terminator, which leads to
silliness when nla_put_string() uses strlen() to figure out the size:

/**
 * nla_put_string - Add a string netlink attribute to a socket buffer
 * @skb: socket buffer to add attribute to
 * @attrtype: attribute type
 * @str: NUL terminated string
*/
static inline int nla_put_string(struct sk_buff *skb, int attrtype,
const char *str)
{
return nla_put(skb, attrtype, strlen(str) + 1, str);
}


This is a problem at least here:

struct regulatory_request {
...
char alpha2[2];
...

static const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = {
...
[NL80211_ATTR_REG_ALPHA2] = { .type = NLA_STRING, .len = 2 },
...

AIUI, working with NLA_STRING needs nla_strlcpy() to "extract" them,
and that takes the nla_policy size normally to bounds-check the copy.


So, this specific problem needs fixing (in at least two places calling
nla_put_string(msg, NL80211_ATTR_REG_ALPHA2, ...)). While I suspect
it's only ever written an extra byte from the following variable in
the structure which is an enum nl80211_dfs_regions, I worry there
might be a lot more of these (though I'd hope unterminated strings are
uncommon for internal representation). And more generally, it seems
like only the NLA _input_ functions actually check nla_policy details.
It seems that the output functions should do the same too, yes?

-Kees

[1] https://github.com/copperhead/linux-hardened/issues/72

-- 
Kees Cook
Pixel Security

[PATCH] NFC: llcp: Limit size of SDP URI

2018-02-14 Thread Kees Cook

The tlv_len is u8, so we need to limit the size of the SDP URI. Enforce
this both in the NLA policy and in the code that performs the allocation
and copy, to avoid writing past the end of the allocated buffer.

Fixes: d9b8d8e19b073 ("NFC: llcp: Service Name Lookup netlink interface")
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
No reply from linux-wireless, so sending to netdev directly now...
---
 net/nfc/llcp_commands.c | 4 
 net/nfc/netlink.c   | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
index 367d8c027101..2ceefa183cee 100644
--- a/net/nfc/llcp_commands.c
+++ b/net/nfc/llcp_commands.c
@@ -149,6 +149,10 @@ struct nfc_llcp_sdp_tlv *nfc_llcp_build_sdreq_tlv(u8 tid, 
char *uri,
 
pr_debug("uri: %s, len: %zu\n", uri, uri_len);
 
+   /* sdreq->tlv_len is u8, takes uri_len, + 3 for header, + 1 for NULL */
+   if (WARN_ON_ONCE(uri_len > U8_MAX - 4))
+   return NULL;
+
sdreq = kzalloc(sizeof(struct nfc_llcp_sdp_tlv), GFP_KERNEL);
if (sdreq == NULL)
return NULL;
diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c
index c0b83dc9d993..f018eafc2a0d 100644
--- a/net/nfc/netlink.c
+++ b/net/nfc/netlink.c
@@ -61,7 +61,8 @@ static const struct nla_policy nfc_genl_policy[NFC_ATTR_MAX + 
1] = {
 };
 
 static const struct nla_policy nfc_sdp_genl_policy[NFC_SDP_ATTR_MAX + 1] = {
-   [NFC_SDP_ATTR_URI] = { .type = NLA_STRING },
+   [NFC_SDP_ATTR_URI] = { .type = NLA_STRING,
+  .len = U8_MAX - 4 },
[NFC_SDP_ATTR_SAP] = { .type = NLA_U8 },
 };
 
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH net-next 0/3] eBPF Seccomp filters

2018-02-13 Thread Kees Cook

On Tue, Feb 13, 2018 at 12:33 PM, Tom Hromatka <tom.hroma...@oracle.com> wrote:
> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sar...@sargun.me> wrote:
>>
>> This patchset enables seccomp filters to be written in eBPF. Although,
>> this patchset doesn't introduce much of the functionality enabled by
>> eBPF, it lays the ground work for it.
>>
>> It also introduces the capability to dump eBPF filters via the PTRACE
>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
>> In the attached samples, there's an example of this. One can then use
>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
>> and use that at reload time.
>>
>> The primary reason for not adding maps support in this patchset is
>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>> If we have a map that the BPF program can read, it can potentially
>> "change" privileges after running. It seems like doing writes only
>> is safe, because it can be pure, and side effect free, and therefore
>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>> to an agreement, this can be in a follow-up patchset.
>
>
>
> Coincidentally I also sent an RFC for adding eBPF hash maps to the seccomp
> userspace mailing list just last week:
> https://groups.google.com/forum/#!topic/libseccomp/pX6QkVF0F74
>
> The kernel changes I proposed are in this email:
> https://groups.google.com/d/msg/libseccomp/pX6QkVF0F74/ZUJlwI5qAwAJ
>
> In that email thread, Kees requested that I try out a binary tree in cBPF
> and evaluate its performance.  I just got a rough prototype working, and
> while not as fast as an eBPF hash map, the cBPF binary tree was a
> significant
> improvement over the linear list of ifs that are currently generated.  Also,
> it only required changing a single function within the libseccomp libary
> itself.
>
> https://github.com/drakenclimber/libseccomp/commit/87b36369f17385f5a7a4d95101185577fbf6203b
>
> Here are the results I am currently seeing using an in-house customer's
> seccomp filter and a simplistic test program that runs getppid() thousands
> of times.
>
> Test Case  minimum TSC ticks to make syscall
> 
> seccomp disabled 620
> getppid() at the front of 306-syscall seccomp filter 722
> getppid() in middle of 306-syscall seccomp filter   1392
> getppid() at the end of the 306-syscall filter  2452
> seccomp using a 306-syscall-sized EBPF hash map  800
> cBPF filter using a binary tree  922

I still think that's a crazy filter. :) It should be inverted to just
check the 26 syscalls and a final "greater than" test. I would expect
it to be faster still. :)

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next 1/3] bpf, seccomp: Add eBPF filter capabilities

2018-02-13 Thread Kees Cook

ing in the real world, it should be fine, but I
might want to instead create a perm-check function and just call it in
both functions. (And likely write a self-test that checks this order,
if it doesn't already exist.)

>
> /* Prepare the new filter before holding any locks. */
> -   prepared = seccomp_prepare_user_filter(filter);
> +   if (filter_type == SECCOMP_SET_MODE_FILTER_EXTENDED)
> +   prepared = seccomp_prepare_extended_filter(filter);
> +   else if (filter_type == SECCOMP_SET_MODE_FILTER)
> +   prepared = seccomp_prepare_user_filter(filter);
> +   else
> +   return -EINVAL;
> +
> if (IS_ERR(prepared))
> return PTR_ERR(prepared);
>
> @@ -867,7 +920,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
>
> spin_lock_irq(>sighand->siglock);
>
> -   if (!seccomp_may_assign_mode(seccomp_mode))
> +   if (!seccomp_may_assign_mode(filter_mode))
> goto out;
>
> ret = seccomp_attach_filter(flags, prepared);
> @@ -876,7 +929,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
> /* Do not free the successfully attached filter. */
> prepared = NULL;
>
> -   seccomp_assign_mode(current, seccomp_mode);
> +   seccomp_assign_mode(current, filter_mode);

With a filter flag, the above hunks don't need to be changed, for example.

>  out:
> spin_unlock_irq(>sighand->siglock);
> if (flags & SECCOMP_FILTER_FLAG_TSYNC)
> @@ -926,7 +979,9 @@ static long do_seccomp(unsigned int op, unsigned int 
> flags,
> return -EINVAL;
> return seccomp_set_mode_strict();
> case SECCOMP_SET_MODE_FILTER:
> -   return seccomp_set_mode_filter(flags, uargs);
> +   return seccomp_set_mode_filter(flags, uargs, op);
> +   case SECCOMP_SET_MODE_FILTER_EXTENDED:
> +   return seccomp_set_mode_filter(flags, uargs, op);

And this isn't needed, since it would be passed as a flag.

> case SECCOMP_GET_ACTION_AVAIL:
> if (flags != 0)
> return -EINVAL;
> @@ -969,6 +1024,10 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char 
> __user *filter)
> op = SECCOMP_SET_MODE_FILTER;
> uargs = filter;
> break;
> +   case SECCOMP_MODE_FILTER_EXTENDED:
> +   op = SECCOMP_SET_MODE_FILTER_EXTENDED;
> +   uargs = filter;
> +   break;

Same.

> default:
> return -EINVAL;
> }
> @@ -1040,8 +1099,7 @@ long seccomp_get_filter(struct task_struct *task, 
> unsigned long filter_off,
> if (IS_ERR(filter))
> return PTR_ERR(filter);
>
> -   fprog = filter->prog->orig_prog;
> -   if (!fprog) {
> +   if (!bpf_prog_was_classic(filter->prog)) {
> /* This must be a new non-cBPF filter, since we save
>  * every cBPF filter's orig_prog above when
>  * CONFIG_CHECKPOINT_RESTORE is enabled.
> @@ -1050,6 +1108,7 @@ long seccomp_get_filter(struct task_struct *task, 
> unsigned long filter_off,
> goto out;
> }
>
> +   fprog = filter->prog->orig_prog;

I wonder if it would be easier to review to split eBPF install from
the eBPF "get filter" changes as separate patches?

> ret = fprog->len;
> if (!data)
> goto out;
> @@ -1239,6 +1298,55 @@ static int seccomp_actions_logged_handler(struct 
> ctl_table *ro_table, int write,
> return 0;
>  }
>
> +#ifdef CONFIG_SECCOMP_FILTER_EXTENDED
> +static bool seccomp_is_valid_access(int off, int size,
> +   enum bpf_access_type type,
> +   struct bpf_insn_access_aux *info)
> +{
> +   if (type != BPF_READ)
> +   return false;
> +
> +   if (off < 0 || off + size > sizeof(struct seccomp_data))
> +   return false;
> +
> +   switch (off) {
> +   case bpf_ctx_range_till(struct seccomp_data, args[0], args[5]):
> +   return (size == sizeof(__u64));
> +   case bpf_ctx_range(struct seccomp_data, nr):
> +   return (size == FIELD_SIZEOF(struct seccomp_data, nr));
> +   case bpf_ctx_range(struct seccomp_data, arch):
> +   return (size == FIELD_SIZEOF(struct seccomp_data, arch));
> +   case bpf_ctx_range(struct seccomp_data, instruction_pointer):
> +   return (size == FIELD_SIZEOF(struct seccomp_data,
> +instruction_pointer));
> +   }
> +
> +   return false;
> +}
> +
> +static const struct bpf_func_proto *
> +seccomp_func_proto(enum bpf_func_id func_id)
> +{
> +   switch (func_id) {
> +   case BPF_FUNC_get_current_uid_gid:
> +   return _get_current_uid_gid_proto;
> +   case BPF_FUNC_trace_printk:
> +   if (capable(CAP_SYS_ADMIN))
> +   return bpf_get_trace_printk_proto();
> +   default:
> +   return NULL;
> +   }
> +}

This makes me so uncomfortable. :) Why is uid/gid needed? Why add
printk support here? (And why is it CAP_SYS_ADMIN checked if the
entire filter is CAP_SYS_ADMIN checked before being attached?)

> +
> +const struct bpf_prog_ops seccomp_prog_ops = {
> +};
> +
> +const struct bpf_verifier_ops seccomp_verifier_ops = {
> +   .get_func_proto = seccomp_func_proto,
> +   .is_valid_access= seccomp_is_valid_access,
> +};
> +#endif /* CONFIG_SECCOMP_FILTER_EXTENDED */
> +
>  static struct ctl_path seccomp_sysctl_path[] = {
> { .procname = "kernel", },
> { .procname = "seccomp", },
> --
> 2.14.1
>

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next 3/3] bpf: Add eBPF seccomp sample programs

2018-02-13 Thread Kees Cook

On Tue, Feb 13, 2018 at 7:43 AM, Sargun Dhillon <sar...@sargun.me> wrote:
> +++ b/samples/bpf/seccomp1_kern.c
> @@ -0,0 +1,17 @@
> +#include 
> +#include 
> +#include 
> +#include "bpf_helpers.h"
> +#include 
> +
> +/* Returns EPERM when trying to close fd 999 */
> +SEC("seccomp")
> +int bpf_prog1(struct seccomp_data *ctx)
> +{
> +   if (ctx->nr == __NR_close && ctx->args[0] == 999)
> +   return SECCOMP_RET_ERRNO | EPERM;
> +
> +   return SECCOMP_RET_ALLOW;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> [...]
> +++ b/samples/bpf/seccomp2_kern.c
> @@ -0,0 +1,24 @@
> +#include 
> +#include 
> +#include 
> +#include "bpf_helpers.h"
> +#include 
> +
> +static inline int unknown(struct seccomp_data *ctx)
> +{
> +   if (ctx->args[0] % 2 == 0)
> +   return SECCOMP_RET_KILL;
> +   return SECCOMP_RET_LOG;
> +}
> +
> +/* Returns errno on sched_yield syscall */
> +SEC("seccomp")
> +int bpf_prog1(struct seccomp_data *ctx)
> +{
> +   if (ctx->nr == __NR_sched_yield)
> +   return SECCOMP_RET_ERRNO | EPERM;
> +
> +   return SECCOMP_RET_ALLOW;
> +}
> +
> +char _license[] SEC("license") = "aGPL";

Nit: these should check architecture before syscall number. Since
they're samples, people look at them for and copy them regularly, they
should be as safe/correct as possible.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next 0/3] eBPF Seccomp filters

2018-02-13 Thread Kees Cook

On Tue, Feb 13, 2018 at 9:31 AM, Sargun Dhillon <sar...@sargun.me> wrote:
> On Tue, Feb 13, 2018 at 9:02 AM, Jessie Frazelle <m...@jessfraz.com> wrote:
>> On Tue, Feb 13, 2018 at 11:29 AM, Sargun Dhillon <sar...@sargun.me> wrote:
>>> On Tue, Feb 13, 2018 at 7:47 AM, Kees Cook <keesc...@chromium.org> wrote:
>>>> What's the reason for adding eBPF support? seccomp shouldn't need it,
>>>> and it only makes the code more complex. I'd rather stick with  -- cBPF
>>>> until we have an overwhelmingly good reason to use eBPF as a "native"
>>>> seccomp filter language.
>>>>
>>> Three reasons:
>>> 1) The userspace tooling for eBPF is much better than the user space
>>> tooling for cBPF. Our use case is specifically to optimize Docker
>>> policies. This is roughly what their seccomp policy looks like:
>>> https://github.com/moby/moby/blob/master/profiles/seccomp/default.json.
>>> It would be much nicer to be able to leverage eBPF to write this in C,
>>> or any other the other languages targetting eBPF. In addition, if we
>>> have write-only maps, we can exfiltrate information from seccomp, like
>>> arguments, and errors in a relatively cheap way compared to cBPF, and
>>> then extract this via the bcc stack. Writing cBPF via C macros is a
>>> pain, and the off the shelf cBPF libraries are getting no love. The
>>> eBPF community is *exploding* with contributions.

eBPF moving quickly is a disincentive from my perspective, as I want
absolutely zero surprises when it comes to seccomp. :) Given the
steady stream of exploitable flaws in eBPF, I don't want seccomp
anywhere near it. :( Many distros ship with the bpf() syscall
disabled, for example (or entirely compiled out, as in Chrome OS and
Android).

The convenience of writing C for eBPF output is certainly nice, but it
seems like either LLVM could grow a cBPF backend, or libseccomp could
be improved to provide the needed features.

Can you explain the exfiltration piece? Do you mean it would be
"cheap" in the sense that the results can be stored and studied
without needing a ptrace manager to catch the failures?

I remain unconvinced that seccomp needs a more descriptive language,
given its limited usage.

> A really naive approach is to take the JSON seccomp policy document
> and converting it to plain old C with switch / case statements. Then
> we can just push that through LLVM and we're in business. Although,
> for some reason, I don't think the folks will want to take a hard dep
> on llvm at runtime, so maybe there's some mechanism where it first
> tries llvm, then tries to create a eBPF application naively, and then
> falls back to cBPF. My primary fear with the first two approaches is
> that given how the policies are written today, it's not conducive to
> the eBPF instruction limit.

How about having libseccomp grow a JSON parser?

>>> 2) In my testing, which thus so far has been very rudimentary, with
>>> rewriting the policy that libseccomp generates from the Docker policy
>>> to use eBPF, and eBPF maps performs much better than cBPF. The
>>> specific case tested was to use a bpf array to lookup rules for a
>>> particular syscall. In a super trivial test, this was about 5% low
>>> latency than using traditional branches. If you need more evidence of
>>> this, I can work a little bit more on the maps related patches, and
>>> see if I can get some more benchmarking. From my understanding, we
>>> would need to add "sealing" support for maps, in which they can be
>>> marked as read-only, and only at that point should an eBPF seccomp
>>> program be able to read from them.

This came up recently on the libseccomp mailing list. The map lookup
is faster than a linear search, but for large filters, the filter can
be written as a balanced tree (as Chrome does), or reordered by
syscall frequency (as is recommended by minijail), and that appears to
get a much larger improvement than even the map lookup.

>>> 3) Eventually, I'd like to use some more advanced capabilities of
>>> eBPF, like being able to rewrite arguments safely (not things referred
>>> to by pointers, but just plain old arguments).

Much like 1), I don't find this an incentive, as the interactions
become much harder to reason about, and I am concerned we'll open
seccomp up to attack for a relatively small benefit. However,
rewriting arguments has come up in very narrow cases, and Tycho was
working on a method of doing userspace notifications (i.e. without a
ptrace manager) to get us closer.

If the needs Tycho outlined[1] could be addressed fully with eBPF, and
we can very narrowly scope the use of the "extra"

Re: [PATCH net-next 0/3] eBPF Seccomp filters

2018-02-13 Thread Kees Cook

On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sar...@sargun.me> wrote:
> This patchset enables seccomp filters to be written in eBPF. Although,
> this patchset doesn't introduce much of the functionality enabled by
> eBPF, it lays the ground work for it.
>
> It also introduces the capability to dump eBPF filters via the PTRACE
> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
> In the attached samples, there's an example of this. One can then use
> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
> and use that at reload time.
>
> The primary reason for not adding maps support in this patchset is
> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
> If we have a map that the BPF program can read, it can potentially
> "change" privileges after running. It seems like doing writes only
> is safe, because it can be pure, and side effect free, and therefore
> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
> to an agreement, this can be in a follow-up patchset.

What's the reason for adding eBPF support? seccomp shouldn't need it,
and it only makes the code more complex. I'd rather stick with cBPF
until we have an overwhelmingly good reason to use eBPF as a "native"
seccomp filter language.

-Kees

>
>
> Sargun Dhillon (3):
>   bpf, seccomp: Add eBPF filter capabilities
>   seccomp, ptrace: Add a mechanism to retrieve attached eBPF seccomp
> filters
>   bpf: Add eBPF seccomp sample programs
>
>  arch/Kconfig |   7 ++
>  include/linux/bpf_types.h|   3 +
>  include/linux/seccomp.h  |  12 +++
>  include/uapi/linux/bpf.h |   2 +
>  include/uapi/linux/ptrace.h  |   5 +-
>  include/uapi/linux/seccomp.h |  15 ++--
>  kernel/bpf/syscall.c |   1 +
>  kernel/ptrace.c  |   3 +
>  kernel/seccomp.c | 185 
> ++-
>  samples/bpf/Makefile |   9 +++
>  samples/bpf/bpf_load.c   |   9 ++-
>  samples/bpf/seccomp1_kern.c  |  17 
>  samples/bpf/seccomp1_user.c  |  34 
>  samples/bpf/seccomp2_kern.c  |  24 ++
>  samples/bpf/seccomp2_user.c  |  66 +++
>  15 files changed, 362 insertions(+), 30 deletions(-)
>  create mode 100644 samples/bpf/seccomp1_kern.c
>  create mode 100644 samples/bpf/seccomp1_user.c
>  create mode 100644 samples/bpf/seccomp2_kern.c
>  create mode 100644 samples/bpf/seccomp2_user.c
>
> --
> 2.14.1
>



-- 
Kees Cook
Pixel Security

Re: [PATCH] net: Whitelist the skbuff_head_cache "cb" field

2018-02-08 Thread Kees Cook

On Fri, Feb 9, 2018 at 7:16 AM, David Miller <da...@davemloft.net> wrote:
> From: Kees Cook <keesc...@chromium.org>
> Date: Wed, 7 Feb 2018 17:44:38 -0800
>
>> Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
>> Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
>> result of the cmsg header calculations. This means that hardened usercopy
>> will examine the copy, even though it was technically a fixed size and
>> should be implicitly whitelisted. All the put_cmsg() calls being built
>> from values in skbuff_head_cache are coming out of the protocol-defined
>> "cb" field, so whitelist this field entirely instead of creating per-use
>> bounce buffers, for which there are concerns about performance.
>>
>> Original report was:
>  ...
>> Reported-by: syzbot+e2d6cfb305e9f3911...@syzkaller.appspotmail.com
>> Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>> ---
>> I tried the inlining, it was awful. Splitting put_cmsg() was awful. So,
>> instead, whitelist the "cb" field as the least bad option if bounce
>> buffers are unacceptable. Dave, do you want to take this through net, or
>> should I take it through the usercopy tree?
>
> Thanks Kees, I'll take this through my 'net' tree.

Cool, thanks. And just to be clear, if it's not already obvious, this
patch needs kmem_cache_create_usercopy() which just landed in Linus's
tree last week, in case you've not merged yet.

-Kees

-- 
Kees Cook
Pixel Security

[PATCH] net: Whitelist the skbuff_head_cache "cb" field

2018-02-07 Thread Kees Cook

Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
result of the cmsg header calculations. This means that hardened usercopy
will examine the copy, even though it was technically a fixed size and
should be implicitly whitelisted. All the put_cmsg() calls being built
from values in skbuff_head_cache are coming out of the protocol-defined
"cb" field, so whitelist this field entirely instead of creating per-use
bounce buffers, for which there are concerns about performance.

Original report was:

Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from 
SLAB object 'skbuff_head_cache' (offset 64, size 16)!
WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 
mm/usercopy.c:76
...
 __check_heap_object+0x89/0xc0 mm/slab.c:4426
 check_heap_object mm/usercopy.c:236 [inline]
 __check_object_size+0x272/0x530 mm/usercopy.c:259
 check_object_size include/linux/thread_info.h:112 [inline]
 check_copy_size include/linux/thread_info.h:143 [inline]
 copy_to_user include/linux/uaccess.h:154 [inline]
 put_cmsg+0x233/0x3f0 net/core/scm.c:242
 sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913
 packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296
 sock_recvmsg_nosec net/socket.c:803 [inline]
 sock_recvmsg+0xc9/0x110 net/socket.c:810
 ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179
 __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287
 SYSC_recvmmsg net/socket.c:2368 [inline]
 SyS_recvmmsg+0xc4/0x160 net/socket.c:2352
 entry_SYSCALL_64_fastpath+0x29/0xa0

Reported-by: syzbot+e2d6cfb305e9f3911...@syzkaller.appspotmail.com
Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
I tried the inlining, it was awful. Splitting put_cmsg() was awful. So,
instead, whitelist the "cb" field as the least bad option if bounce
buffers are unacceptable. Dave, do you want to take this through net, or
should I take it through the usercopy tree?
---
 net/core/skbuff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6b0ff396fa9d..201b96c8f414 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3889,10 +3889,12 @@ EXPORT_SYMBOL_GPL(skb_gro_receive);
 
 void __init skb_init(void)
 {
-   skbuff_head_cache = kmem_cache_create("skbuff_head_cache",
+   skbuff_head_cache = kmem_cache_create_usercopy("skbuff_head_cache",
  sizeof(struct sk_buff),
  0,
  SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+ offsetof(struct sk_buff, cb),
+ sizeof_field(struct sk_buff, cb),
  NULL);
skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
    sizeof(struct sk_buff_fclones),
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH v2] socket: Provide put_cmsg_whitelist() for constant size copies

2018-02-06 Thread Kees Cook

On Wed, Feb 7, 2018 at 3:19 AM, David Miller <da...@davemloft.net> wrote:
> From: Kees Cook <keesc...@chromium.org>
> Date: Tue, 6 Feb 2018 04:31:50 +1100
>
>> On Tue, Feb 6, 2018 at 2:03 AM, David Miller <da...@davemloft.net> wrote:
>>> From: Kees Cook <keesc...@chromium.org>
>>> Date: Fri, 2 Feb 2018 02:27:49 -0800
>>>
>>>> @@ -343,6 +343,14 @@ struct ucred {
>>>>
>>>>  extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct 
>>>> sockaddr_storage *kaddr);
>>>>  extern int put_cmsg(struct msghdr*, int level, int type, int len, void 
>>>> *data);
>>>> +/*
>>>> + * Provide a bounce buffer for copying cmsg data to userspace when the
>>>> + * target memory isn't already whitelisted for hardened usercopy.
>>>> + */
>>>> +#define put_cmsg_whitelist(_msg, _level, _type, _ptr) ({ \
>>>> + typeof(*(_ptr)) _val = *(_ptr); \
>>>> + put_cmsg(_msg, _level, _type, sizeof(_val), &_val); \
>>>> + })
>>>
>>> I understand what you are trying to achieve, but it's at a real cost
>>> here.  Some of these objects are structures, for example the struct
>>> sock_extended_err is 16 bytes.
>>
>> It didn't look like put_cmsg() was on a fast path, so it seemed like a
>> bounce buffer was the best solution here (and it's not without
>> precedent).
>
> For some things like timestamps it can be important.

Making put_cmsg() inline would help quite a bit with tracking the
builtin_const-ness, and that could speed things up a little bit too.
Would you be opposed to inlining?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v2] socket: Provide put_cmsg_whitelist() for constant size copies

2018-02-05 Thread Kees Cook

On Tue, Feb 6, 2018 at 2:03 AM, David Miller <da...@davemloft.net> wrote:
> From: Kees Cook <keesc...@chromium.org>
> Date: Fri, 2 Feb 2018 02:27:49 -0800
>
>> @@ -343,6 +343,14 @@ struct ucred {
>>
>>  extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct 
>> sockaddr_storage *kaddr);
>>  extern int put_cmsg(struct msghdr*, int level, int type, int len, void 
>> *data);
>> +/*
>> + * Provide a bounce buffer for copying cmsg data to userspace when the
>> + * target memory isn't already whitelisted for hardened usercopy.
>> + */
>> +#define put_cmsg_whitelist(_msg, _level, _type, _ptr) ({ \
>> + typeof(*(_ptr)) _val = *(_ptr); \
>> + put_cmsg(_msg, _level, _type, sizeof(_val), &_val); \
>> + })
>
> I understand what you are trying to achieve, but it's at a real cost
> here.  Some of these objects are structures, for example the struct
> sock_extended_err is 16 bytes.

It didn't look like put_cmsg() was on a fast path, so it seemed like a
bounce buffer was the best solution here (and it's not without
precedent).

> And now we're going to copy it twice, once into the on-stack copy,
> and then once again into the CMSG blob.
>
> Please find a way to make hardened user copy happy without adding
> new overhead.

Another idea would be breaking put_cmsg() up into a macro with helper
functions, rearrange the arguments to avoid the math, and leaving the
copy_to_user() inline to see the const-ness, but that seemed way
uglier to me.

I'll think about it some more, but I think having put_cmsg_whitelist()
called only in a few places is reasonable here.

-Kees

-- 
Kees Cook
Pixel Security

[PATCH v2] socket: Provide put_cmsg_whitelist() for constant size copies

2018-02-02 Thread Kees Cook

Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
But within put_cmsg(), the copy_to_user() call is made with a dynamic
length, as a result of the cmsg header calculations. This means that
hardened usercopy will examine the copy, even though it was technically
a fixed size and should be implicitly whitelisted.

Most callers of put_cmsg() are copying out of stack or kmalloc, so these
cases aren't a problem for hardened usercopy. However, some try to copy
out of the skbuff_head_cache slab, including the "cb" region. Since
whitelisting the slab area would leave other protocol definition of the
"cb" region exposed to usercopy bugs, this creates put_cmsg_whitelist(),
which internally uses sizeof() to provide a constant-sized length and
a stack bounce buffer, in order to explicitly whitelist an otherwise
disallowed slab region.

Original report was:

Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from 
SLAB object 'skbuff_head_cache' (offset 64, size 16)!
WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 
mm/usercopy.c:76
...
 __check_heap_object+0x89/0xc0 mm/slab.c:4426
 check_heap_object mm/usercopy.c:236 [inline]
 __check_object_size+0x272/0x530 mm/usercopy.c:259
 check_object_size include/linux/thread_info.h:112 [inline]
 check_copy_size include/linux/thread_info.h:143 [inline]
 copy_to_user include/linux/uaccess.h:154 [inline]
 put_cmsg+0x233/0x3f0 net/core/scm.c:242
 sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913
 packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296
 sock_recvmsg_nosec net/socket.c:803 [inline]
 sock_recvmsg+0xc9/0x110 net/socket.c:810
 ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179
 __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287
 SYSC_recvmmsg net/socket.c:2368 [inline]
 SyS_recvmmsg+0xc4/0x160 net/socket.c:2352
 entry_SYSCALL_64_fastpath+0x29/0xa0

Reported-by: syzbot+e2d6cfb305e9f3911...@syzkaller.appspotmail.com
Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/socket.h   | 8 
 net/core/sock.c  | 4 +---
 net/iucv/af_iucv.c   | 5 ++---
 net/netlink/af_netlink.c | 4 ++--
 net/socket.c | 4 ++--
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 9286a5a8c60c..1f52e998068b 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -343,6 +343,14 @@ struct ucred {
 
 extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct 
sockaddr_storage *kaddr);
 extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);
+/*
+ * Provide a bounce buffer for copying cmsg data to userspace when the
+ * target memory isn't already whitelisted for hardened usercopy.
+ */
+#define put_cmsg_whitelist(_msg, _level, _type, _ptr) ({   \
+   typeof(*(_ptr)) _val = *(_ptr); \
+   put_cmsg(_msg, _level, _type, sizeof(_val), &_val); \
+   })
 
 struct timespec;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index f39206b41b32..d8a3228acfd0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2879,7 +2879,6 @@ void sock_enable_timestamp(struct sock *sk, int flag)
 int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len,
   int level, int type)
 {
-   struct sock_exterr_skb *serr;
struct sk_buff *skb;
int copied, err;
 
@@ -2899,8 +2898,7 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr 
*msg, int len,
 
sock_recv_timestamp(msg, sk, skb);
 
-   serr = SKB_EXT_ERR(skb);
-   put_cmsg(msg, level, type, sizeof(serr->ee), >ee);
+   put_cmsg_whitelist(msg, level, type, _EXT_ERR(skb)->ee);
 
msg->msg_flags |= MSG_ERRQUEUE;
err = copied;
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 148533169b1d..676c019ba357 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1407,9 +1407,8 @@ static int iucv_sock_recvmsg(struct socket *sock, struct 
msghdr *msg,
/* create control message to store iucv msg target class:
 * get the trgcls from the control buffer of the skb due to
 * fragmentation of original iucv message. */
-   err = put_cmsg(msg, SOL_IUCV, SCM_IUCV_TRGCLS,
-  sizeof(IUCV_SKB_CB(skb)->class),
-  (void *)_SKB_CB(skb)->class);
+   err = put_cmsg_whitelist(msg, SOL_IUCV, SCM_IUCV_TRGCLS,
+_SKB_CB(skb)->class);
if (err) {
if (!(flags & MSG_PEEK))
skb_queue_head(>sk_receive_queue, skb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b9e0ee4e22f5..4420dba35a44 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1781,8 +1781,8 @@ static void netlink_cmsg_listen_all_nsid(

[PATCH] socket: Provide bounce buffer for constant sized put_cmsg()

2018-02-01 Thread Kees Cook

Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
result of the cmsg header calculations. This means that hardened usercopy
will examine the copy, even though it was technically a fixed size and
should be implicitly whitelisted. Since most whitelists for put_cmsg()
would need to be in skbuff_head_cache on a per-protocol basis, avoid this
complexity by just providing small bounce buffers where the size is fixed.

Original report was:

Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from 
SLAB object 'skbuff_head_cache' (offset 64, size 16)!
WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 
mm/usercopy.c:76
...
 __check_heap_object+0x89/0xc0 mm/slab.c:4426
 check_heap_object mm/usercopy.c:236 [inline]
 __check_object_size+0x272/0x530 mm/usercopy.c:259
 check_object_size include/linux/thread_info.h:112 [inline]
 check_copy_size include/linux/thread_info.h:143 [inline]
 copy_to_user include/linux/uaccess.h:154 [inline]
 put_cmsg+0x233/0x3f0 net/core/scm.c:242
 sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913
 packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296
 sock_recvmsg_nosec net/socket.c:803 [inline]
 sock_recvmsg+0xc9/0x110 net/socket.c:810
 ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179
 __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287
 SYSC_recvmmsg net/socket.c:2368 [inline]
 SyS_recvmmsg+0xc4/0x160 net/socket.c:2352
 entry_SYSCALL_64_fastpath+0x29/0xa0

Reported-by: syzbot+e2d6cfb305e9f3911...@syzkaller.appspotmail.com
Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/socket.h | 18 +-
 net/core/scm.c |  4 ++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 9286a5a8c60c..b3c5a075b7b3 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -342,7 +342,23 @@ struct ucred {
 #define IPX_TYPE   1
 
 extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct 
sockaddr_storage *kaddr);
-extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);
+extern int __put_cmsg(struct msghdr*, int level, int type, int len, void 
*data);
+/*
+ * Provide a bounce buffer for copying cmsg data to userspace when the size
+ * is constant. Without this, hardened usercopy will see the dynamic size
+ * calculation in __put_cmsg and try to block it. Constant sized copies
+ * should not trigger hardened usercopy checks.
+ */
+#define put_cmsg(_msg, _level, _type, _len, _ptr) ({   \
+   int _rc;\
+   if (__builtin_constant_p(_len)) {   \
+   typeof(*(_ptr)) _val = *(_ptr); \
+   BUILD_BUG_ON(sizeof(_val) != (_len));   \
+   _rc = __put_cmsg(_msg, _level, _type, sizeof(_val), &_val); \
+   } else {\
+   _rc = __put_cmsg(_msg, _level, _type, _len, _ptr);  \
+   }   \
+   _rc;})
 
 struct timespec;
 
diff --git a/net/core/scm.c b/net/core/scm.c
index b1ff8a441748..3a3ecf528800 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -213,7 +213,7 @@ int __scm_send(struct socket *sock, struct msghdr *msg, 
struct scm_cookie *p)
 }
 EXPORT_SYMBOL(__scm_send);
 
-int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
+int __put_cmsg(struct msghdr *msg, int level, int type, int len, void *data)
 {
struct cmsghdr __user *cm
= (__force struct cmsghdr __user *)msg->msg_control;
@@ -250,7 +250,7 @@ int put_cmsg(struct msghdr * msg, int level, int type, int 
len, void *data)
 out:
return err;
 }
-EXPORT_SYMBOL(put_cmsg);
+EXPORT_SYMBOL(__put_cmsg);
 
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 {
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH] RDS: Fix rds-ping inducing kernel panic

2018-01-22 Thread Kees Cook

On Tue, Jan 23, 2018 at 4:01 AM, Santosh Shilimkar
<santosh.shilim...@oracle.com> wrote:
> On 1/22/2018 3:24 AM, Kees Cook wrote:
>>
>> As described in: https://bugzilla.redhat.com/show_bug.cgi?id=822754
>>
>> Attempting an RDS connection from the IP address of an IPoIB interface
>> to itself causes a kernel panic due to a BUG_ON() being triggered.
>> Making the test less strict allows rds-ping to work without crashing
>> the machine.
>>
>> A local unprivileged user could use this flaw to crash the sytem.
>>
> Are you able to reproduce this issue on mainline kernel ?
> IIRC, this sjouldn't happen anymore but if you see it, please
> let me know. Will try it as well. rds-ping on self
> loopback device is often tested and used as well for
> monitoring services in production.

I don't have an RDS test setup, no. But it sounds like kernels without
this patch aren't seeing the problem.

> Am not sure if its applicable anymore. Infact the issue with
> loopback device was due to congestion update and thats been
> already addressed with commit '18fc25c94: {rds: prevent BUG_ON
> triggered on congestion update to loopback}'

That looks very much like it was fixed there. Thanks!

-Kees

-- 
Kees Cook
Pixel Security

[PATCH] RDS: Fix rds-ping inducing kernel panic

2018-01-22 Thread Kees Cook

As described in: https://bugzilla.redhat.com/show_bug.cgi?id=822754

Attempting an RDS connection from the IP address of an IPoIB interface
to itself causes a kernel panic due to a BUG_ON() being triggered.
Making the test less strict allows rds-ping to work without crashing
the machine.

A local unprivileged user could use this flaw to crash the sytem.

I think this fix was written by Jay Fenlason <fenla...@redhat.com>,
and extracted from the RedHat kernel patches here:

https://oss.oracle.com/git/gitweb.cgi?p=redpatch.git;a=commitdiff;h=c7b6a0a1d8d636852be130fa15fa8be10d4704e8

This fix appears to have been carried by at least RedHat, Oracle, and
Ubuntu for several years.

CVE-2012-2372

Reported-by: Honggang Li <ho...@redhat.com>
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
This is what I get for researching CVE lifetimes...
---
 net/rds/ib_send.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 8557a1cae041..5fbf635d17cb 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -506,7 +506,7 @@ int rds_ib_xmit(struct rds_connection *conn, struct 
rds_message *rm,
int flow_controlled = 0;
int nr_sig = 0;
 
-   BUG_ON(off % RDS_FRAG_SIZE);
+   BUG_ON(!conn->c_loopback && off % RDS_FRAG_SIZE);
BUG_ON(hdr_off != 0 && hdr_off != sizeof(struct rds_header));
 
/* Do not send cong updates to IB loopback */
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH 27/38] sctp: Copy struct sctp_sock.autoclose to userspace using put_user()

2018-01-18 Thread Kees Cook

On Thu, Jan 18, 2018 at 1:31 PM, Laura Abbott <labb...@redhat.com> wrote:
> On 01/10/2018 06:02 PM, Kees Cook wrote:
>>
>> From: David Windsor <d...@nullcore.net>
>>
>> The autoclose field can be copied with put_user(), so there is no need to
>> use copy_to_user(). In both cases, hardened usercopy is being bypassed
>> since the size is constant, and not open to runtime manipulation.
>>
>> This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
>> whitelisting code in the last public patch of grsecurity/PaX based on my
>> understanding of the code. Changes or omissions from the original code are
>> mine and don't reflect the original grsecurity/PaX code.
>>
>
> Just tried a quick rebase and it looks like this conflicts with
> c76f97c99ae6 ("sctp: make use of pre-calculated len")
> I don't think we can use put_user if we're copying via the full
> len?

It should be fine, since:

len = sizeof(int);

c76f97c99ae6 just does a swap of sizeof(int) with len, put_user() will
work in either case, since autoclose will always be int sized.

-Kees

>
> Thanks,
> Laura
>
>
>> Signed-off-by: David Windsor <d...@nullcore.net>
>> [kees: adjust commit log]
>> Cc: Vlad Yasevich <vyasev...@gmail.com>
>> Cc: Neil Horman <nhor...@tuxdriver.com>
>> Cc: "David S. Miller" <da...@davemloft.net>
>> Cc: linux-s...@vger.kernel.org
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>> ---
>>   net/sctp/socket.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index efbc8f52c531..15491491ec88 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -5011,7 +5011,7 @@ static int sctp_getsockopt_autoclose(struct sock
>> *sk, int len, char __user *optv
>> len = sizeof(int);
>> if (put_user(len, optlen))
>> return -EFAULT;
>> -   if (copy_to_user(optval, _sk(sk)->autoclose, sizeof(int)))
>> +   if (put_user(sctp_sk(sk)->autoclose, (int __user *)optval))
>> return -EFAULT;
>> return 0;
>>   }
>>
>



-- 
Kees Cook
Pixel Security

Re: [PATCH 33/38] arm64: Implement thread_struct whitelist for hardened usercopy

2018-01-15 Thread Kees Cook

On Mon, Jan 15, 2018 at 4:24 AM, Dave P Martin <dave.mar...@arm.com> wrote:
> On Thu, Jan 11, 2018 at 02:03:05AM +0000, Kees Cook wrote:
>> This whitelists the FPU register state portion of the thread_struct for
>> copying to userspace, instead of the default entire structure.
>>
>> Cc: Catalin Marinas <catalin.mari...@arm.com>
>> Cc: Will Deacon <will.dea...@arm.com>
>> Cc: Christian Borntraeger <borntrae...@de.ibm.com>
>> Cc: Ingo Molnar <mi...@kernel.org>
>> Cc: James Morse <james.mo...@arm.com>
>> Cc: "Peter Zijlstra (Intel)" <pet...@infradead.org>
>> Cc: Dave Martin <dave.mar...@arm.com>
>> Cc: zijun_hu <zijun...@htc.com>
>> Cc: linux-arm-ker...@lists.infradead.org
>> Signed-off-by: Kees Cook <keesc...@chromium.org>
>> ---
>>  arch/arm64/Kconfig | 1 +
>>  arch/arm64/include/asm/processor.h | 8 
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index a93339f5178f..c84477e6a884 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -90,6 +90,7 @@ config ARM64
>>   select HAVE_ARCH_MMAP_RND_BITS
>>   select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
>>   select HAVE_ARCH_SECCOMP_FILTER
>> + select HAVE_ARCH_THREAD_STRUCT_WHITELIST
>>   select HAVE_ARCH_TRACEHOOK
>>   select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>>   select HAVE_ARCH_VMAP_STACK
>> diff --git a/arch/arm64/include/asm/processor.h 
>> b/arch/arm64/include/asm/processor.h
>> index 023cacb946c3..e58a5864ec89 100644
>> --- a/arch/arm64/include/asm/processor.h
>> +++ b/arch/arm64/include/asm/processor.h
>> @@ -113,6 +113,14 @@ struct thread_struct {
>>   struct debug_info   debug;  /* debugging */
>>  };
>>
>> +/* Whitelist the fpsimd_state for copying to userspace. */
>> +static inline void arch_thread_struct_whitelist(unsigned long *offset,
>> + unsigned long *size)
>> +{
>> + *offset = offsetof(struct thread_struct, fpsimd_state);
>> + *size = sizeof(struct fpsimd_state);
>
> This should be fpsimd_state.user_fpsimd (fpsimd_state.cpu is important
> for correctly context switching and not supposed to be user-accessible.
> A user copy that encompasses that is definitely a bug).

So, I actually spent some more time looking at this due to the
comments from rmk on arm32, and I don't think any whitelist is needed
here at all. (i.e. it can be *offset = *size = 0) This is because all
the usercopying I could find uses static sizes or bounce buffers, both
of which bypass the dynamic-size hardened usercopy checks.

I've been running some arm64 builds now with this change, and I
haven't tripped over any problems yet...

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH 02/38] usercopy: Enhance and rename report_usercopy()

2018-01-14 Thread Kees Cook

On Thu, Jan 11, 2018 at 9:06 AM, Christopher Lameter <c...@linux.com> wrote:
> On Wed, 10 Jan 2018, Kees Cook wrote:
>
>> diff --git a/mm/slab.h b/mm/slab.h
>> index ad657ffa44e5..7d29e69ac310 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -526,4 +526,10 @@ static inline int cache_random_seq_create(struct 
>> kmem_cache *cachep,
>>  static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
>>  #endif /* CONFIG_SLAB_FREELIST_RANDOM */
>>
>> +#ifdef CONFIG_HARDENED_USERCOPY
>> +void __noreturn usercopy_abort(const char *name, const char *detail,
>> +bool to_user, unsigned long offset,
>> +unsigned long len);
>> +#endif
>> +
>>  #endif /* MM_SLAB_H */
>
> This code has nothing to do with slab allocation. Move it into
> include/linux/uaccess.h where the other user space access definitions are?

Since it was only the mm/sl*b.c files using it, it seemed like the
right place, but it's a reasonable point. I've moved it now.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH 34/38] arm: Implement thread_struct whitelist for hardened usercopy

2018-01-11 Thread Kees Cook

On Thu, Jan 11, 2018 at 2:24 AM, Russell King - ARM Linux
<li...@armlinux.org.uk> wrote:
> On Wed, Jan 10, 2018 at 06:03:06PM -0800, Kees Cook wrote:
>> ARM does not carry FPU state in the thread structure, so it can declare
>> no usercopy whitelist at all.
>
> This comment seems to be misleading.  We have stored FP state in the
> thread structure for a long time - for example, VFP state is stored
> in thread->vfpstate.hard, so we _do_ have floating point state in
> the thread structure.
>
> What I think this commit message needs to describe is why we don't
> need a whitelist _despite_ having FP state in the thread structure.
>
> At the moment, the commit message is making me think that this patch
> is wrong and will introduce a regression.

Yeah, I will improve this comment; it's not clear enough. The places
where I see state copied to/from userspace are all either static sizes
or already use bounce buffers (or both). e.g.:

err |= __copy_from_user(>fpregs, >fpregs,
sizeof(hwstate->fpregs));

I will adjust the commit log and comment to more clearly describe the
lack of whitelisting due to all-static sized copies.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH 13/38] ext4: Define usercopy region in ext4_inode_cache slab cache

2018-01-11 Thread Kees Cook

On Thu, Jan 11, 2018 at 9:01 AM, Theodore Ts'o <ty...@mit.edu> wrote:
> On Wed, Jan 10, 2018 at 06:02:45PM -0800, Kees Cook wrote:
>> The ext4 symlink pathnames, stored in struct ext4_inode_info.i_data
>> and therefore contained in the ext4_inode_cache slab cache, need
>> to be copied to/from userspace.
>
> Symlink operations to/from userspace aren't common or in the hot path,
> and when they are in i_data, limited to at most 60 bytes.  Is it worth
> it to copy through a bounce buffer so as to disallow any usercopies
> into struct ext4_inode_info?

If this is the only place it's exposed, yeah, that might be a way to
avoid the per-FS patches. This would, AIUI, require changing
readlink_copy() to include a bounce buffer, and that would require an
allocation. I kind of prefer just leaving the per-FS whitelists, as
then there's no global overhead added.

-Kees

-- 
Kees Cook
Pixel Security

Re: linux-next: manual merge of the kspp tree with the net tree

2018-01-11 Thread Kees Cook

On Wed, Jan 10, 2018 at 8:40 PM, Stephen Rothwell <s...@canb.auug.org.au> wrote:
> Hi Kees,
>
> Today's linux-next merge of the kspp tree got a conflict in:
>
>   net/sctp/socket.c
>
> between commit:
>
>   c76f97c99ae6 ("sctp: make use of pre-calculated len")
>
> from the net tree and commit:
>
>   3511d716f5a8 ("sctp: Copy struct sctp_sock.autoclose to userspace using 
> put_user()")
>
> from the kspp tree.
>
> I fixed it up (I just used the latter version) and can carry the fix as
> necessary. This is now fixed as far as linux-next is concerned, but any
> non trivial conflicts should be mentioned to your upstream maintainer
> when your tree is submitted for merging.  You may also want to consider
> cooperating with the maintainer of the conflicting tree to minimise any
> particularly complex conflicts.

Thanks! And yes, confirmed, the kspp tree version should be used to
resolve this conflict.

-Kees

-- 
Kees Cook
Pixel Security

[PATCH 35/38] kvm: whitelist struct kvm_vcpu_arch

2018-01-10 Thread Kees Cook

From: Paolo Bonzini <pbonz...@redhat.com>

On x86, ARM and s390, struct kvm_vcpu_arch has a usercopy region
that is read and written by the KVM_GET/SET_CPUID2 ioctls (x86)
or KVM_GET/SET_ONE_REG (ARM/s390).  Without whitelisting the area,
KVM is completely broken on those architectures with usercopy hardening
enabled.

For now, allow writing to the entire struct on all architectures.
The KVM tree will not refine this to an architecture-specific
subset of struct kvm_vcpu_arch.

Cc: kernel-harden...@lists.openwall.com
Cc: Kees Cook <keesc...@chromium.org>
Cc: Christian Borntraeger <borntrae...@redhat.com>
Cc: Christoffer Dall <cd...@linaro.org>
Cc: Radim Krčmář <rkrc...@redhat.com>
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
Acked-by: Christoffer Dall <christoffer.d...@linaro.org>
Acked-by: Marc Zyngier <marc.zyng...@arm.com>
Acked-by: Christian Borntraeger <borntrae...@de.ibm.com>
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 virt/kvm/kvm_main.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c422c10cd1dd..96689967f5c3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4029,8 +4029,12 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
/* A kmem cache lets us meet the alignment requirements of fx_save. */
if (!vcpu_align)
vcpu_align = __alignof__(struct kvm_vcpu);
-   kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size, vcpu_align,
-  SLAB_ACCOUNT, NULL);
+   kvm_vcpu_cache =
+   kmem_cache_create_usercopy("kvm_vcpu", vcpu_size, vcpu_align,
+  SLAB_ACCOUNT,
+  offsetof(struct kvm_vcpu, arch),
+  sizeof_field(struct kvm_vcpu, arch),
+  NULL);
if (!kvm_vcpu_cache) {
r = -ENOMEM;
goto out_free_3;
-- 
2.7.4

[PATCH 33/38] arm64: Implement thread_struct whitelist for hardened usercopy

2018-01-10 Thread Kees Cook

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire structure.

Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Christian Borntraeger <borntrae...@de.ibm.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: James Morse <james.mo...@arm.com>
Cc: "Peter Zijlstra (Intel)" <pet...@infradead.org>
Cc: Dave Martin <dave.mar...@arm.com>
Cc: zijun_hu <zijun...@htc.com>
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 arch/arm64/Kconfig | 1 +
 arch/arm64/include/asm/processor.h | 8 
 2 files changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a93339f5178f..c84477e6a884 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -90,6 +90,7 @@ config ARM64
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
+   select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_ARCH_VMAP_STACK
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 023cacb946c3..e58a5864ec89 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -113,6 +113,14 @@ struct thread_struct {
struct debug_info   debug;  /* debugging */
 };
 
+/* Whitelist the fpsimd_state for copying to userspace. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+   unsigned long *size)
+{
+   *offset = offsetof(struct thread_struct, fpsimd_state);
+   *size = sizeof(struct fpsimd_state);
+}
+
 #ifdef CONFIG_COMPAT
 #define task_user_tls(t)   \
 ({ \
-- 
2.7.4

[PATCH 16/38] befs: Define usercopy region in befs_inode_cache slab cache

2018-01-10 Thread Kees Cook

From: David Windsor <d...@nullcore.net>

befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
and therefore contained in the befs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
fs/befs/linuxvfs.c:
befs_alloc_inode(...):
...
bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
...
return >vfs_inode;

befs_iget(...):
...
strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
BEFS_SYMLINK_LEN);
...
inode->i_link = befs_ino->i_data.symlink;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

(inlined in vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
befs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <d...@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Luis de Bethencourt <lui...@kernel.org>
Cc: Salah Triki <salah.tr...@gmail.com>
Signed-off-by: Kees Cook <keesc...@chromium.org>
Acked-by: Luis de Bethencourt <lui...@kernel.org>
---
 fs/befs/linuxvfs.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index ee236231cafa..af2832aaeec5 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, 
unsigned long ino)
 static int __init
 befs_init_inodecache(void)
 {
-   befs_inode_cachep = kmem_cache_create("befs_inode_cache",
- sizeof (struct befs_inode_info),
- 0, (SLAB_RECLAIM_ACCOUNT|
-   SLAB_MEM_SPREAD|SLAB_ACCOUNT),
- init_once);
+   befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
+   sizeof(struct befs_inode_info), 0,
+   (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+   SLAB_ACCOUNT),
+   offsetof(struct befs_inode_info,
+   i_data.symlink),
+   sizeof_field(struct befs_inode_info,
+   i_data.symlink),
+   init_once);
if (befs_inode_cachep == NULL)
return -ENOMEM;
 
-- 
2.7.4

[PATCH 38/38] lkdtm: Update usercopy tests for whitelisting

2018-01-10 Thread Kees Cook

This updates the USERCOPY_HEAP_FLAG_* tests to USERCOPY_HEAP_WHITELIST_*,
since the final form of usercopy whitelisting ended up using an offset/size
window instead of the earlier proposed allocation flags.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/misc/lkdtm.h  |  4 +-
 drivers/misc/lkdtm_core.c |  4 +-
 drivers/misc/lkdtm_usercopy.c | 88 ---
 3 files changed, 53 insertions(+), 43 deletions(-)

diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h
index 687a0dbbe199..9e513dcfd809 100644
--- a/drivers/misc/lkdtm.h
+++ b/drivers/misc/lkdtm.h
@@ -76,8 +76,8 @@ void __init lkdtm_usercopy_init(void);
 void __exit lkdtm_usercopy_exit(void);
 void lkdtm_USERCOPY_HEAP_SIZE_TO(void);
 void lkdtm_USERCOPY_HEAP_SIZE_FROM(void);
-void lkdtm_USERCOPY_HEAP_FLAG_TO(void);
-void lkdtm_USERCOPY_HEAP_FLAG_FROM(void);
+void lkdtm_USERCOPY_HEAP_WHITELIST_TO(void);
+void lkdtm_USERCOPY_HEAP_WHITELIST_FROM(void);
 void lkdtm_USERCOPY_STACK_FRAME_TO(void);
 void lkdtm_USERCOPY_STACK_FRAME_FROM(void);
 void lkdtm_USERCOPY_STACK_BEYOND(void);
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c
index ba92291508dc..e3f2bac4b245 100644
--- a/drivers/misc/lkdtm_core.c
+++ b/drivers/misc/lkdtm_core.c
@@ -177,8 +177,8 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ATOMIC_TIMING),
CRASHTYPE(USERCOPY_HEAP_SIZE_TO),
CRASHTYPE(USERCOPY_HEAP_SIZE_FROM),
-   CRASHTYPE(USERCOPY_HEAP_FLAG_TO),
-   CRASHTYPE(USERCOPY_HEAP_FLAG_FROM),
+   CRASHTYPE(USERCOPY_HEAP_WHITELIST_TO),
+   CRASHTYPE(USERCOPY_HEAP_WHITELIST_FROM),
CRASHTYPE(USERCOPY_STACK_FRAME_TO),
CRASHTYPE(USERCOPY_STACK_FRAME_FROM),
CRASHTYPE(USERCOPY_STACK_BEYOND),
diff --git a/drivers/misc/lkdtm_usercopy.c b/drivers/misc/lkdtm_usercopy.c
index 9ebbb031e5e3..9725aed305bb 100644
--- a/drivers/misc/lkdtm_usercopy.c
+++ b/drivers/misc/lkdtm_usercopy.c
@@ -20,7 +20,7 @@
  */
 static volatile size_t unconst = 0;
 static volatile size_t cache_size = 1024;
-static struct kmem_cache *bad_cache;
+static struct kmem_cache *whitelist_cache;
 
 static const unsigned char test_text[] = "This is a test.\n";
 
@@ -115,6 +115,10 @@ static noinline void do_usercopy_stack(bool to_user, bool 
bad_frame)
vm_munmap(user_addr, PAGE_SIZE);
 }
 
+/*
+ * This checks for whole-object size validation with hardened usercopy,
+ * with or without usercopy whitelisting.
+ */
 static void do_usercopy_heap_size(bool to_user)
 {
unsigned long user_addr;
@@ -177,77 +181,79 @@ static void do_usercopy_heap_size(bool to_user)
kfree(two);
 }
 
-static void do_usercopy_heap_flag(bool to_user)
+/*
+ * This checks for the specific whitelist window within an object. If this
+ * test passes, then do_usercopy_heap_size() tests will pass too.
+ */
+static void do_usercopy_heap_whitelist(bool to_user)
 {
-   unsigned long user_addr;
-   unsigned char *good_buf = NULL;
-   unsigned char *bad_buf = NULL;
+   unsigned long user_alloc;
+   unsigned char *buf = NULL;
+   unsigned char __user *user_addr;
+   size_t offset, size;
 
/* Make sure cache was prepared. */
-   if (!bad_cache) {
+   if (!whitelist_cache) {
pr_warn("Failed to allocate kernel cache\n");
return;
}
 
/*
-* Allocate one buffer from each cache (kmalloc will have the
-* SLAB_USERCOPY flag already, but "bad_cache" won't).
+* Allocate a buffer with a whitelisted window in the buffer.
 */
-   good_buf = kmalloc(cache_size, GFP_KERNEL);
-   bad_buf = kmem_cache_alloc(bad_cache, GFP_KERNEL);
-   if (!good_buf || !bad_buf) {
-   pr_warn("Failed to allocate buffers from caches\n");
+   buf = kmem_cache_alloc(whitelist_cache, GFP_KERNEL);
+   if (!buf) {
+   pr_warn("Failed to allocate buffer from whitelist cache\n");
goto free_alloc;
}
 
/* Allocate user memory we'll poke at. */
-   user_addr = vm_mmap(NULL, 0, PAGE_SIZE,
+   user_alloc = vm_mmap(NULL, 0, PAGE_SIZE,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANONYMOUS | MAP_PRIVATE, 0);
-   if (user_addr >= TASK_SIZE) {
+   if (user_alloc >= TASK_SIZE) {
pr_warn("Failed to allocate user memory\n");
goto free_alloc;
}
+   user_addr = (void __user *)user_alloc;
 
-   memset(good_buf, 'A', cache_size);
-   memset(bad_buf, 'B', cache_size);
+   memset(buf, 'B', cache_size);
+
+   /* Whitelisted window in buffer, from kmem_cache_create_usercopy. */
+   offset = (cache_size / 4) + unconst;
+   size = (cache_size / 16) + unconst;
 
if (to_user) {
-   pr_info("attempting good copy_to_user with SLAB_USER

[PATCH v5 00/38] Hardened usercopy whitelisting

2018-01-10 Thread Kees Cook

v5:
- add Acks
- split stddef changes into separate patch
- further refactor reporting code for readability
- adjust enforcement code for greater readability

v4:
- refactor reporting to include offset and remove %p
- explicitly WARN by default for the whitelisting
- add KVM whitelists and harden ioctl handling

v3:
- added LKDTM update patch
- downgrade BUGs to WARNs and fail closed
- add Acks/Reviews from v2

v2:
- added tracing of allocation and usage
- refactored solutions for task_struct
- split up network patches for readability

I intend for this to land via my usercopy hardening tree, so Acks,
Reviewed, and Tested-bys would be greatly appreciated. FWIW, the bulk of
this series has survived for months in 0-day testing and -next, with the
more recently-added offset-reporting having existed there for a week.

Linus, getting your attention early on this -- instead of during the
merge window :) -- would be greatly appreciated. I'm hoping this is a
good time, in a slight lull in PTI and related things needing attention.



This series is modified from Brad Spengler/PaX Team's PAX_USERCOPY code
in the last public patch of grsecurity/PaX based on our understanding
of the code. Changes or omissions from the original code are ours and
don't reflect the original grsecurity/PaX code.

David Windsor did the bulk of the porting, refactoring, splitting,
testing, etc; I did some extra tweaks, hunk moving, traces, and extra
patches.

Description from patch 1:


Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:   48074720
Usercopyable Memory:  6367532  13.2%
 task_struct0.2% 4480/1630720
 RAW0.3%300/96000
 RAWv6  2.1%   1408/64768
 ext4_inode_cache   3.0%   269760/8740224
 dentry11.1%   585984/5273856
 mm_struct 29.1% 54912/188448
 kmalloc-8100.0%  24576/24576
 kmalloc-16   100.0%  28672/28672
 kmalloc-32   100.0%  81920/81920
 kmalloc-192  100.0%  96768/96768
 kmalloc-128  100.0%143360/143360
 names_cache  100.0%163840/163840
 kmalloc-64   100.0%167936/167936
 kmalloc-256  100.0%339968/339968
 kmalloc-512  100.0%350720/350720
 kmalloc-96   100.0%455616/455616
 kmalloc-8192 100.0%655360/655360
 kmalloc-1024 100.0%812032/812032
 kmalloc-4096 100.0%819200/819200
 kmalloc-2048 100.0%  1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:   95516184
Usercopyable Memory:  8497452   8.8%
 task_struct0.2% 4000/1456000
 RAW0.3%300/96000
 RAWv6  2.1%   1408/64768
 ext4_inode_cache   3.0% 1217280/39439872
 dentry11.1%

[PATCH 04/38] lkdtm/usercopy: Adjust test to include an offset to check reporting

2018-01-10 Thread Kees Cook

Instead of doubling the size, push the start position up by 16 bytes to
still trigger an overflow. This allows to verify that offset reporting
is working correctly.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/misc/lkdtm_usercopy.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/lkdtm_usercopy.c b/drivers/misc/lkdtm_usercopy.c
index a64372cc148d..9ebbb031e5e3 100644
--- a/drivers/misc/lkdtm_usercopy.c
+++ b/drivers/misc/lkdtm_usercopy.c
@@ -119,6 +119,8 @@ static void do_usercopy_heap_size(bool to_user)
 {
unsigned long user_addr;
unsigned char *one, *two;
+   void __user *test_user_addr;
+   void *test_kern_addr;
size_t size = unconst + 1024;
 
one = kmalloc(size, GFP_KERNEL);
@@ -139,27 +141,30 @@ static void do_usercopy_heap_size(bool to_user)
memset(one, 'A', size);
memset(two, 'B', size);
 
+   test_user_addr = (void __user *)(user_addr + 16);
+   test_kern_addr = one + 16;
+
if (to_user) {
pr_info("attempting good copy_to_user of correct size\n");
-   if (copy_to_user((void __user *)user_addr, one, size)) {
+   if (copy_to_user(test_user_addr, test_kern_addr, size / 2)) {
pr_warn("copy_to_user failed unexpectedly?!\n");
goto free_user;
}
 
pr_info("attempting bad copy_to_user of too large size\n");
-   if (copy_to_user((void __user *)user_addr, one, 2 * size)) {
+   if (copy_to_user(test_user_addr, test_kern_addr, size)) {
pr_warn("copy_to_user failed, but lacked Oops\n");
goto free_user;
}
} else {
pr_info("attempting good copy_from_user of correct size\n");
-   if (copy_from_user(one, (void __user *)user_addr, size)) {
+   if (copy_from_user(test_kern_addr, test_user_addr, size / 2)) {
pr_warn("copy_from_user failed unexpectedly?!\n");
goto free_user;
}
 
pr_info("attempting bad copy_from_user of too large size\n");
-   if (copy_from_user(one, (void __user *)user_addr, 2 * size)) {
+   if (copy_from_user(test_kern_addr, test_user_addr, size)) {
pr_warn("copy_from_user failed, but lacked Oops\n");
goto free_user;
}
-- 
2.7.4

[PATCH 02/38] usercopy: Enhance and rename report_usercopy()

2018-01-10 Thread Kees Cook

In preparation for refactoring the usercopy checks to pass offset to
the hardened usercopy report, this renames report_usercopy() to the
more accurate usercopy_abort(), marks it as noreturn because it is,
adds a hopefully helpful comment for anyone investigating such reports,
makes the function available to the slab allocators, and adds new "detail"
and "offset" arguments.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 mm/slab.h |  6 ++
 mm/usercopy.c | 24 +++-
 tools/objtool/check.c |  1 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index ad657ffa44e5..7d29e69ac310 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -526,4 +526,10 @@ static inline int cache_random_seq_create(struct 
kmem_cache *cachep,
 static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
 
+#ifdef CONFIG_HARDENED_USERCOPY
+void __noreturn usercopy_abort(const char *name, const char *detail,
+  bool to_user, unsigned long offset,
+  unsigned long len);
+#endif
+
 #endif /* MM_SLAB_H */
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 5df1e68d4585..8006baa4caac 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -58,11 +58,25 @@ static noinline int check_stack_object(const void *obj, 
unsigned long len)
return GOOD_STACK;
 }
 
-static void report_usercopy(unsigned long len, bool to_user, const char *type)
+/*
+ * If this function is reached, then CONFIG_HARDENED_USERCOPY has found an
+ * unexpected state during a copy_from_user() or copy_to_user() call.
+ * There are several checks being performed on the buffer by the
+ * __check_object_size() function. Normal stack buffer usage should never
+ * trip the checks, and kernel text addressing will always trip the check.
+ * For cache objects, copies must be within the object size.
+ */
+void __noreturn usercopy_abort(const char *name, const char *detail,
+  bool to_user, unsigned long offset,
+  unsigned long len)
 {
-   pr_emerg("kernel memory %s attempt detected %s '%s' (%lu bytes)\n",
-   to_user ? "exposure" : "overwrite",
-   to_user ? "from" : "to", type ? : "unknown", len);
+   pr_emerg("Kernel memory %s attempt detected %s %s%s%s%s (offset %lu, 
size %lu)!\n",
+to_user ? "exposure" : "overwrite",
+to_user ? "from" : "to",
+name ? : "unknown?!",
+detail ? " '" : "", detail ? : "", detail ? "'" : "",
+offset, len);
+
/*
 * For greater effect, it would be nice to do do_group_exit(),
 * but BUG() actually hooks all the lock-breaking and per-arch
@@ -260,6 +274,6 @@ void __check_object_size(const void *ptr, unsigned long n, 
bool to_user)
return;
 
 report:
-   report_usercopy(n, to_user, err);
+   usercopy_abort(err, NULL, to_user, 0, n);
 }
 EXPORT_SYMBOL(__check_object_size);
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 9b341584eb1b..ae39444896d4 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -138,6 +138,7 @@ static int __dead_end_function(struct objtool_file *file, 
struct symbol *func,
"__reiserfs_panic",
"lbug_with_loc",
"fortify_panic",
+   "usercopy_abort",
};
 
if (func->bind == STB_WEAK)
-- 
2.7.4

[PATCH 03/38] usercopy: Include offset in hardened usercopy report

2018-01-10 Thread Kees Cook

This refactors the hardened usercopy code so that failure reporting can
happen within the checking functions instead of at the top level. This
simplifies the return value handling and allows more details and offsets
to be included in the report. Having the offset can be much more helpful
in understanding hardened usercopy bugs.

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/slab.h | 12 +++
 mm/slab.c|  8 ++---
 mm/slub.c| 14 
 mm/usercopy.c| 95 +++-
 4 files changed, 57 insertions(+), 72 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 50697a1d6621..2dbeccdcb76b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -167,15 +167,11 @@ void kzfree(const void *);
 size_t ksize(const void *);
 
 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
-const char *__check_heap_object(const void *ptr, unsigned long n,
-   struct page *page);
+void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
+   bool to_user);
 #else
-static inline const char *__check_heap_object(const void *ptr,
- unsigned long n,
- struct page *page)
-{
-   return NULL;
-}
+static inline void __check_heap_object(const void *ptr, unsigned long n,
+  struct page *page, bool to_user) { }
 #endif
 
 /*
diff --git a/mm/slab.c b/mm/slab.c
index 183e996dde5f..b2beb2cc15e2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4397,8 +4397,8 @@ module_init(slab_proc_init);
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
  */
-const char *__check_heap_object(const void *ptr, unsigned long n,
-   struct page *page)
+void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
+bool to_user)
 {
struct kmem_cache *cachep;
unsigned int objnr;
@@ -4414,9 +4414,9 @@ const char *__check_heap_object(const void *ptr, unsigned 
long n,
 
/* Allow address range falling entirely within object size. */
if (offset <= cachep->object_size && n <= cachep->object_size - offset)
-   return NULL;
+   return;
 
-   return cachep->name;
+   usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/slub.c b/mm/slub.c
index cfd56e5a35fb..bcd22332300a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3818,8 +3818,8 @@ EXPORT_SYMBOL(__kmalloc_node);
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
  */
-const char *__check_heap_object(const void *ptr, unsigned long n,
-   struct page *page)
+void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
+bool to_user)
 {
struct kmem_cache *s;
unsigned long offset;
@@ -3831,7 +3831,8 @@ const char *__check_heap_object(const void *ptr, unsigned 
long n,
 
/* Reject impossible pointers. */
if (ptr < page_address(page))
-   return s->name;
+   usercopy_abort("SLUB object not in SLUB page?!", NULL,
+  to_user, 0, n);
 
/* Find offset within object. */
offset = (ptr - page_address(page)) % s->size;
@@ -3839,15 +3840,16 @@ const char *__check_heap_object(const void *ptr, 
unsigned long n,
/* Adjust for redzone and reject if within the redzone. */
if (kmem_cache_debug(s) && s->flags & SLAB_RED_ZONE) {
if (offset < s->red_left_pad)
-   return s->name;
+   usercopy_abort("SLUB object in left red zone",
+  s->name, to_user, offset, n);
offset -= s->red_left_pad;
}
 
/* Allow address range falling entirely within object size. */
if (offset <= object_size && n <= object_size - offset)
-   return NULL;
+   return;
 
-   return s->name;
+   usercopy_abort("SLUB object", s->name, to_user, offset, n);
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 8006baa4caac..a562dd094ace 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -86,10 +86,10 @@ void __noreturn usercopy_abort(const char *name, const char 
*detail,
 }
 
 /* Returns true if any portion of [ptr,ptr+n) over laps with [low,high). */
-static bool overlaps(const void *ptr, unsigned long n, unsigned long low,
-unsigned long high)
+static bool overlaps(const unsigned long ptr, unsigned long n,
+unsigned long low, uns

[PATCH 22/38] scsi: Define usercopy region in scsi_sense_cache slab cache

2018-01-10 Thread Kees Cook

From: David Windsor <d...@nullcore.net>

SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
contained in the scsi_sense_cache slab cache, need to be copied to/from
userspace.

cache object allocation:
drivers/scsi/scsi_lib.c:
scsi_select_sense_cache(...):
return ... ? scsi_sense_isadma_cache : scsi_sense_cache

scsi_alloc_sense_buffer(...):
return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);

scsi_init_request(...):
...
cmd->sense_buffer = scsi_alloc_sense_buffer(...);
...
cmd->req.sense = cmd->sense_buffer

example usage trace:

block/scsi_ioctl.c:
(inline from sg_io)
blk_complete_sghdr_rq(...):
struct scsi_request *req = scsi_req(rq);
...
copy_to_user(..., req->sense, len)

scsi_cmd_ioctl(...):
sg_io(...);

In support of usercopy hardening, this patch defines a region in
the scsi_sense_cache slab cache in which userspace copy operations
are allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

Signed-off-by: David Windsor <d...@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "James E.J. Bottomley" <j...@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
Cc: linux-s...@vger.kernel.org
Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 drivers/scsi/scsi_lib.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1cbc497e00bd..164d062c4d94 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -79,14 +79,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
if (shost->unchecked_isa_dma) {
scsi_sense_isadma_cache =
kmem_cache_create("scsi_sense_cache(DMA)",
-   SCSI_SENSE_BUFFERSIZE, 0,
-   SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
+   SCSI_SENSE_BUFFERSIZE, 0,
+   SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
if (!scsi_sense_isadma_cache)
ret = -ENOMEM;
} else {
scsi_sense_cache =
-   kmem_cache_create("scsi_sense_cache",
-   SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, NULL);
+   kmem_cache_create_usercopy("scsi_sense_cache",
+   SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN,
+   0, SCSI_SENSE_BUFFERSIZE, NULL);
if (!scsi_sense_cache)
ret = -ENOMEM;
}
-- 
2.7.4

[PATCH 05/38] stddef.h: Introduce sizeof_field()

2018-01-10 Thread Kees Cook

The size of fields within a structure is needed in a few places in the
kernel already, and will be needed for the usercopy whitelisting when
declaring whitelist regions within structures. This creates a dedicated
macro and redefines offsetofend() to use it.

Existing usage, ignoring the 1200+ lustre assert uses:

$ git grep -E 'sizeof\(\(\((struct )?[a-zA-Z_]+ \*\)0\)->' | \
grep -v staging/lustre | wc -l
65

Signed-off-by: Kees Cook <keesc...@chromium.org>
---
 include/linux/stddef.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 2181719fd907..998a4ba28eba 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -20,12 +20,20 @@ enum {
 #endif
 
 /**
+ * sizeof_field(TYPE, MEMBER)
+ *
+ * @TYPE: The structure containing the field of interest
+ * @MEMBER: The field to return the size of
+ */
+#define sizeof_field(TYPE, MEMBER) sizeofTYPE *)0)->MEMBER))
+
+/**
  * offsetofend(TYPE, MEMBER)
  *
  * @TYPE: The type of the structure
  * @MEMBER: The member within the structure to get the end offset of
  */
 #define offsetofend(TYPE, MEMBER) \
-   (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER))
+   (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER))
 
 #endif
-- 
2.7.4

1 2 3 4 5 6 7 >

1 - 100 of 602 matches

Mail list logo