Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-23 Thread Solar Designer
On Wed, Sep 23, 2020 at 08:00:07PM +0200, Solar Designer wrote:
> A couple of other things Brad kindly pointed out:
> 
> SELinux already has similar protections (execmem, execmod):
> 
> http://lkml.iu.edu/hypermail/linux/kernel/0508.2/0194.html
> https://danwalsh.livejournal.com/6117.html

Actually, that's right in Madhavan's "Introduction": "LSMs such as
SELinux implement W^X" and "The W^X implementation today is not
complete."  I'm sorry I jumped into this thread out of context.

Alexander


Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-23 Thread Solar Designer
On Wed, Sep 23, 2020 at 04:39:31PM +0200, Florian Weimer wrote:
> * Solar Designer:
> 
> > While I share my opinion here, I don't mean that to block Madhavan's
> > work.  I'd rather defer to people more knowledgeable in current userland
> > and ABI issues/limitations and plans on dealing with those, especially
> > to Florian Weimer.  I haven't seen Florian say anything specific for or
> > against Madhavan's proposal, and I'd like to.  (Have I missed that?)

[...]
> I think it's unnecessary for the libffi use case.
[...]

> I don't know if kernel support could
> make sense in this context, but it would be a completely different
> patch.

Thanks.  Are there currently relevant use cases where the proposed
trampfd would be useful and likely actually made use of by userland -
e.g., specific userland project developers saying they'd use it, or
Madhavan intending to develop and contribute userland patches?

Alexander


Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-23 Thread Solar Designer
On Wed, Sep 23, 2020 at 05:18:35PM +0200, Pavel Machek wrote:
> > It sure does make sense to combine ret2libc/ROP to mprotect() with one's
> > own injected shellcode.  Compared to doing everything from ROP, this is
> > easier and more reliable across versions/builds if the desired
> > payload
> 
> Ok, so this starts to be a bit confusing.
> 
> I thought W^X is to protect from attackers that have overflowed buffer
> somewhere, but can not to do arbitrary syscalls, yet.
> 
> You are saying that there's important class of attackers that can do
> some syscalls but not arbitrary ones.

They might be able to do many, most, or all arbitrary syscalls via
ret2libc or such.  The crucial detail is that each time they do that,
they risk incompatibility with the given target system (version, build,
maybe ASLR if gadgets from multiple libraries are involved).  By using
mprotect(), they only take this risk once (need to get the address of an
mprotect() gadget and of what to change protections on right), and then
they can invoke multiple syscalls from their shellcode more reliably.
So for doing a lot of work, mprotect() combined with injected code can
be easier and more reliable.  It is also an extra option an attacker can
use, in addition to doing everything via borrowed code.  More
flexibility for the attacker means the attacker may choose whichever
approach works better in a given case (or try several).

I am embarrassed for not thinking/recalling this when I first posted
earlier today.  It's actually obvious.  I'm just getting old and rusty.

> I'd like to see definition of that attacker (and perhaps description
> of the system the protection is expected to be useful on -- if it is
> not close to common Linux distros).

There's nothing unusual about that attacker and the system.

A couple of other things Brad kindly pointed out:

SELinux already has similar protections (execmem, execmod):

http://lkml.iu.edu/hypermail/linux/kernel/0508.2/0194.html
https://danwalsh.livejournal.com/6117.html

PaX MPROTECT is implemented in a way or at a layer that covers ptrace()
abuse that I mentioned.  (At least that's how I understood Brad.)

Alexander

P.S. Meanwhile, Twitter locked my account "for security purposes".  Fun.
I'll just let it be for now.


Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-23 Thread Solar Designer
On Wed, Sep 23, 2020 at 11:14:56AM +0200, Solar Designer wrote:
> On Wed, Sep 23, 2020 at 10:14:26AM +0200, Pavel Machek wrote:
> > > Introduction
> > > 
> > > 
> > > Dynamic code is used in many different user applications. Dynamic code is
> > > often generated at runtime. Dynamic code can also just be a pre-defined
> > > sequence of machine instructions in a data buffer. Examples of dynamic
> > > code are trampolines, JIT code, DBT code, etc.
> > > 
> > > Dynamic code is placed either in a data page or in a stack page. In order
> > > to execute dynamic code, the page it resides in needs to be mapped with
> > > execute permissions. Writable pages with execute permissions provide an
> > > attack surface for hackers. Attackers can use this to inject malicious
> > > code, modify existing code or do other harm.
> > > 
> > > To mitigate this, LSMs such as SELinux implement W^X. That is, they may 
> > > not
> > > allow pages to have both write and execute permissions. This prevents
> > > dynamic code from executing and blocks applications that use it. To allow
> > > genuine applications to run, exceptions have to be made for them (by 
> > > setting
> > > execmem, etc) which opens the door to security issues.
> > > 
> > > The W^X implementation today is not complete. There exist many user level
> > > tricks that can be used to load and execute dynamic code. E.g.,
> > > 
> > > - Load the code into a file and map the file with R-X.
> > > 
> > > - Load the code in an RW- page. Change the permissions to R--. Then,
> > >   change the permissions to R-X.
> > > 
> > > - Load the code in an RW- page. Remap the page with R-X to get a separate
> > >   mapping to the same underlying physical page.
> > > 
> > > IMO, these are all security holes as an attacker can exploit them to 
> > > inject
> > > his own code.
> > 
> > IMO, you are smoking crack^H^H very seriously misunderstanding what
> > W^X is supposed to protect from.
> > 
> > W^X is not supposed to protect you from attackers that can already do
> > system calls. So loading code into a file then mapping the file as R-X
> > is in no way security hole in W^X.
> > 
> > If you want to provide protection from attackers that _can_ do system
> > calls, fine, but please don't talk about W^X and please specify what
> > types of attacks you want to prevent and why that's good thing.
> 
> On one hand, Pavel is absolutely right.  It is ridiculous to say that
> "these are all security holes as an attacker can exploit them to inject
> his own code."

I stand corrected, due to Brad's tweet and follow-ups here:

https://twitter.com/spendergrsec/status/1308728284390318082

It sure does make sense to combine ret2libc/ROP to mprotect() with one's
own injected shellcode.  Compared to doing everything from ROP, this is
easier and more reliable across versions/builds if the desired payload
is non-trivial.  My own example: invoking a shell in a local attack on
Linux is trivial enough to do via ret2libc only, but a connect-back
shell in a remote attack might be easier and more reliably done via
mprotect() + shellcode.

Per the follow-ups, this was an established technique on Windows and iOS
until further hardening prevented it.  So it does make sense for Linux
to do the same (as an option because of it breaking existing stuff), and
not so much as policy enforcement for the sake of it and ease of
reasoning, but mostly to force real-world exploits to be more complex
and less reliable.

> On the other hand, "what W^X is supposed to protect from" depends on how
> the term W^X is defined (historically, by PaX and OpenBSD).  It may be
> that W^X is partially not a feature to defeat attacks per se, but also a
> policy enforcement feature preventing use of dangerous techniques (JIT).
> 
> Such policy might or might not make sense.  It might make sense for ease
> of reasoning, e.g. "I've flipped this setting, and now I'm certain the
> system doesn't have JIT within a process (can still have it through
> dynamically creating and invoking an entire new program), so there are
> no opportunities for an attacker to inject code nor generate previously
> non-existing ROP gadgets into an executable mapping within a process."
> 
> I do find it questionable whether such policy and such reasoning make
> sense beyond academia.

I was wrong in the above, focusing on the wrong thing.

> Then, there might be even more ways in which W^X is not perfect enough
> to enable such reasoning.  What about using ptrace(2) to inject code?

Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-23 Thread Solar Designer
On Wed, Sep 23, 2020 at 10:14:26AM +0200, Pavel Machek wrote:
> > Introduction
> > 
> > 
> > Dynamic code is used in many different user applications. Dynamic code is
> > often generated at runtime. Dynamic code can also just be a pre-defined
> > sequence of machine instructions in a data buffer. Examples of dynamic
> > code are trampolines, JIT code, DBT code, etc.
> > 
> > Dynamic code is placed either in a data page or in a stack page. In order
> > to execute dynamic code, the page it resides in needs to be mapped with
> > execute permissions. Writable pages with execute permissions provide an
> > attack surface for hackers. Attackers can use this to inject malicious
> > code, modify existing code or do other harm.
> > 
> > To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> > allow pages to have both write and execute permissions. This prevents
> > dynamic code from executing and blocks applications that use it. To allow
> > genuine applications to run, exceptions have to be made for them (by setting
> > execmem, etc) which opens the door to security issues.
> > 
> > The W^X implementation today is not complete. There exist many user level
> > tricks that can be used to load and execute dynamic code. E.g.,
> > 
> > - Load the code into a file and map the file with R-X.
> > 
> > - Load the code in an RW- page. Change the permissions to R--. Then,
> >   change the permissions to R-X.
> > 
> > - Load the code in an RW- page. Remap the page with R-X to get a separate
> >   mapping to the same underlying physical page.
> > 
> > IMO, these are all security holes as an attacker can exploit them to inject
> > his own code.
> 
> IMO, you are smoking crack^H^H very seriously misunderstanding what
> W^X is supposed to protect from.
> 
> W^X is not supposed to protect you from attackers that can already do
> system calls. So loading code into a file then mapping the file as R-X
> is in no way security hole in W^X.
> 
> If you want to provide protection from attackers that _can_ do system
> calls, fine, but please don't talk about W^X and please specify what
> types of attacks you want to prevent and why that's good thing.

On one hand, Pavel is absolutely right.  It is ridiculous to say that
"these are all security holes as an attacker can exploit them to inject
his own code."

On the other hand, "what W^X is supposed to protect from" depends on how
the term W^X is defined (historically, by PaX and OpenBSD).  It may be
that W^X is partially not a feature to defeat attacks per se, but also a
policy enforcement feature preventing use of dangerous techniques (JIT).

Such policy might or might not make sense.  It might make sense for ease
of reasoning, e.g. "I've flipped this setting, and now I'm certain the
system doesn't have JIT within a process (can still have it through
dynamically creating and invoking an entire new program), so there are
no opportunities for an attacker to inject code nor generate previously
non-existing ROP gadgets into an executable mapping within a process."

I do find it questionable whether such policy and such reasoning make
sense beyond academia.

Then, there might be even more ways in which W^X is not perfect enough
to enable such reasoning.  What about using ptrace(2) to inject code?
Should enabling W^X also disable ability to debug programs by non-root?
We already have Yama ptrace_scope, which can achieve that at the highest
setting, although that's rather inconvenient and is probably unexpected
by most to be a requirement for having (ridiculously?) full W^X allowing
for the academic reasoning.

Personally, I am for policies that make more practical sense.  For
example, years ago I advocated here on kernel-hardening that we should
have a mode where ELF flags enabling/disabling executable stack are
ignored, and non-executable stack is always enforced.  This should also
be extended to default (at program startup) permissions on more than
just stack (but also on .bss, typical libcs' heap allocations, etc.)
However, I am not convinced there's enough value in extending the policy
to restricting explicit uses of mprotect(2).

Yes, PaX did that, and its emutramp.txt said "runtime code generation is
by its nature incompatible with PaX's PAGEEXEC/SEGMEXEC and MPROTECT
features, therefore the real solution is not in emulation but by
designing a kernel API for runtime code generation and modifying
userland to make use of it."  However, not being convinced in the
MPROTECT feature having enough practical value, I am also not convinced
"a kernel API for runtime code generation and modifying userland to make
use of it" is the way to go.

Having static instead of dynamically-generated trampolines in userland
code where possible (and making other userland/ABI changes to make that
possible in more/all cases) is an obvious improvement, and IMO should be
a priority over the above.

While I share my opinion here, I don't mean that to block Madhavan's
work.  I'd 

Re: [PATCH v2] Documentation/security-bugs: provide more information about linux-distros

2019-07-19 Thread Solar Designer
On Thu, Jul 18, 2019 at 06:51:07PM -0700, Kees Cook wrote:
> On Thu, Jul 18, 2019 at 08:39:19PM -0400, Sasha Levin wrote:
> > On Thu, Jul 18, 2019 at 03:00:55PM -0700, Kees Cook wrote:
> > > On Wed, Jul 17, 2019 at 07:11:03PM -0400, Sasha Levin wrote:
> > > > Provide more information about how to interact with the linux-distros
> > > > mailing list for disclosing security bugs.
> > > > 
> > > > Reference the linux-distros list policy and clarify that the reporter
> > > > must read and understand those policies as they differ from
> > > > secur...@kernel.org's policy.
> > > > 
> > > > Suggested-by: Solar Designer 
> > > > Signed-off-by: Sasha Levin 
> > > 
> > > Sorry, but NACK, see below...

I like Sasha's PATCH v2 better, but if Kees insists on NACK'ing it then
I suggest that we apply Sasha's first revision of the patch instead.
I think either revision is an improvement on the status quo.

> I think reinforcing information to avoid past mistakes is appropriate
> here.

Maybe, but from my perspective common past issues with Linux kernel bugs
reported to linux-distros were:

- The reporter having been directed to post from elsewhere (and I
suspect this documentation file) without being aware of list policy.

- The reporter not mentioning (and sometimes not replying even when
asked) whether they're also coordinating with security@k.o or whether
they want someone on linux-distros to help coordinate with security@k.o.
(Maybe this is something we want to write about here.)

- The Linux kernel bug having been introduced too recently to be of much
interest to distros.

> Reports have regularly missed the "[vs]" detail or suggested
> embargoes that ended on Fridays, etc.

This happens too.  Regarding missing the "[vs]" detail, technically
there are also a number of other conditions that also let the message
through, but those are changing and are deliberately not advertised.

> Sending to the distros@ list risks exposing Linux-only flaws to non-Linux
> distros.

Right.

> This has caused leaks in the past

Do you mean leaks to *BSD security teams or to the public?  I'm not
aware of past leaks to the public via the non-Linux distros present on
the distros@ list.  Are you?

Alexander


Re: [PATCH 1/3] x86/asm: Pin sensitive CR0 bits

2019-02-27 Thread Solar Designer
On Tue, Feb 26, 2019 at 03:36:45PM -0800, Kees Cook wrote:
>  static inline void native_write_cr0(unsigned long val)
>  {
> - asm volatile("mov %0,%%cr0": : "r" (val), "m" (__force_order));
> + bool warn = false;
> +
> +again:
> + val |= X86_CR0_WP;
> + /*
> +  * In order to have the compiler not optimize away the check
> +  * in the WARN_ONCE(), mark "val" as being also an output ("+r")

This comment is now slightly out of date: the check is no longer "in the
WARN_ONCE()".  Ditto about the comment for CR4.

> +  * by this asm() block so it will perform an explicit check, as
> +  * if it were "volatile".
> +  */
> + asm volatile("mov %0,%%cr0": "+r" (val) : "m" (__force_order) : );
> + /*
> +  * If the MOV above was used directly as a ROP gadget we can
> +  * notice the lack of pinned bits in "val" and start the function
> +  * from the beginning to gain the WP bit for sure. And do it
> +  * without first taking the exception for a WARN().
> +  */
> + if ((val & X86_CR0_WP) != X86_CR0_WP) {
> + warn = true;
> + goto again;
> + }
> + WARN_ONCE(warn, "Attempt to unpin X86_CR0_WP, cr0 bypass attack?!\n");
>  }

Alexander


Re: [PATCH v2] x86/asm: Pin sensitive CR4 bits

2019-02-21 Thread Solar Designer
On Wed, Feb 20, 2019 at 01:20:58PM -0800, Kees Cook wrote:
> On Wed, Feb 20, 2019 at 10:49 AM Solar Designer  wrote:
> >
> > On Wed, Feb 20, 2019 at 10:09:34AM -0800, Kees Cook wrote:
> > > + if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))
> > > + goto again;
> >
> > I think "goto again" is too mild a response given that it occurs after a
> > successful write of a non-pinned value to CR4.  I think it'd allow some
> > exploits to eventually win the race: make their desired use of whatever
> > functionality SMEP, etc. would have prevented - which may be just a few
> > instructions they need to run - before the CR4 value is reverted after
> > "goto again".  I think it's one of those cases where a kernel panic
> > would be more appropriate.
> 
> It will not land upstream with a BUG() or panic(). Linus has
> explicitly stated that none of this work can do that until it has
> "baked" in the kernel for a couple years.

OK.

> In his defense, anyone sufficiently paranoid can already raise the
> priority of a WARN() into a panic via sysctl kernel.panic_on_warn (and
> kernel.panic_on_oops).

I think there are too many uses of WARN() for anyone sane to enable
that in production, whereas it'd have made sense to enable it for the
few security-related uses.

> > Also, WARN_ONCE possibly introduces a delay sufficient to realistically
> > win this race on the first try.  If we choose to warn, we should do it
> > after having reverted the CR4 value, not before.
> 
> Isn't cr4 CPU-local though?

Good point.  I don't know.  If CR4 is per hardware thread, then the race
would require an interrupt and would be much harder to win.

> Couldn't we turn off interrupts to stop the race?

This won't help.  An attack would skip the code that disables interrupts
and land right on the MOV instruction.

Alexander


Re: [PATCH v2] x86/asm: Pin sensitive CR4 bits

2019-02-20 Thread Solar Designer
On Wed, Feb 20, 2019 at 10:09:34AM -0800, Kees Cook wrote:
> +extern volatile unsigned long cr4_pin;
> +
>  static inline void native_write_cr4(unsigned long val)
>  {
> +again:
> + val |= cr4_pin;
>   asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
> + /*
> +  * If the MOV above was used directly as a ROP gadget we can
> +  * notice the lack of pinned bits in "val" and start the function
> +  * from the beginning to gain the cr4_pin bits for sure.
> +  */
> + if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))
> + goto again;
>  }

I think "goto again" is too mild a response given that it occurs after a
successful write of a non-pinned value to CR4.  I think it'd allow some
exploits to eventually win the race: make their desired use of whatever
functionality SMEP, etc. would have prevented - which may be just a few
instructions they need to run - before the CR4 value is reverted after
"goto again".  I think it's one of those cases where a kernel panic
would be more appropriate.

Also, WARN_ONCE possibly introduces a delay sufficient to realistically
win this race on the first try.  If we choose to warn, we should do it
after having reverted the CR4 value, not before.

Alexander


Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories

2017-12-07 Thread Solar Designer
On Tue, Dec 05, 2017 at 11:21:00AM +0100, Salvatore Mesoraca wrote:
> 2017-11-30 17:30 GMT+01:00 Solar Designer <so...@openwall.com>:
> > $ strace flock /tmp/lockfile -c cat
> > [...]
> > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
> > flock(3, LOCK_EX)   = 0
> >
> > This use of flock(1) would be a worse vulnerability, not limited to DoS
> > against the script itself but also allowing for privilege escalation
> > unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
> > would help (reduce the impact back to DoS against itself), and would
> > require that the retry logic (like what is seen in the lock directory
> > example above) be present.

> > > That behavior can be certainly avoided, but of course it isn't a
> > > security problem per se.
> >
> > I think it is a security problem per se, except in the "subtle case"
> > above, and it's great that our proposed policy would catch it.
> 
> I agree on the DoS, though, at first, I didn't consider it a "bug" because
> there isn't any open mode that can prevent the DoS in this case.
> If you want to avoid it, you must avoid other-users-writable directories
> at all. So, It think that, if you are using a sticky directory, it's
> intended behavior to let someone else "lock" you.

Right.  There's a worse DoS I had overlooked, though: flock(1) can also
be made to create and/or lock another file (maybe in another directory).
Perhaps adding O_NOFOLLOW (alongside flock(1)'s existing use of
O_NOCTTY) would be a good idea, even though uses in a directory writable
by someone else are inherently risky anyway.

> But maybe many flock(1) users are not aware of the issue and so, sometimes,
> it can be unintended.
> I didn't consider privilege escalation as an issue because I always
> looked at flock(1) under the assumption that the lockfile is never actually
> read or modified in any way and so it shouldn't make too much difference if
> it's an already existing regular file or a symlink etc.
> Am I missing something?

You made a good point, but yes: O_CREAT will follow a dangling symlink
and there are cases where creating an empty file of an attacker-chosen
pathname may allow for privilege escalation.  For example, crontab(1)
man page on RHEL7 says regarding /etc/cron.allow and /etc/cron.deny:

"If neither of these files exists, only the super user is allowed to
use cron."

In that configuration, simply creating empty /etc/cron.deny grants
access to crontab(1) to all users.  As user:

$ crontab -l
You (solar) are not allowed to use this program (crontab)
See crontab(1) for more information
$ ln -s /etc/cron.deny /tmp/lockfile

As root:

# sysctl -w fs.protected_symlinks=0
fs.protected_symlinks = 0
# flock /tmp/lockfile -c echo

As user:

$ crontab -l
no crontab for solar

There may also be side-effects on open of device files (the best known
example is rewinding a tape), and we won't avoid those by retrying
without O_CREAT|O_EXCL.  O_NOFOLLOW will help against symlinks to device
files, but not against hard links (if on same device).  The kernel's
symlink and hardlink protections help, but in this sub-thread we were
discussing detecting userspace software issues without waiting for an
attack.  Things like this fit David Laight's point well: programs trying
to make risky (mis)uses less risky sometimes also avoid being detected
by our proposed policy.  That's life.

Alexander


Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories

2017-12-07 Thread Solar Designer
On Tue, Dec 05, 2017 at 11:21:00AM +0100, Salvatore Mesoraca wrote:
> 2017-11-30 17:30 GMT+01:00 Solar Designer :
> > $ strace flock /tmp/lockfile -c cat
> > [...]
> > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
> > flock(3, LOCK_EX)   = 0
> >
> > This use of flock(1) would be a worse vulnerability, not limited to DoS
> > against the script itself but also allowing for privilege escalation
> > unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
> > would help (reduce the impact back to DoS against itself), and would
> > require that the retry logic (like what is seen in the lock directory
> > example above) be present.

> > > That behavior can be certainly avoided, but of course it isn't a
> > > security problem per se.
> >
> > I think it is a security problem per se, except in the "subtle case"
> > above, and it's great that our proposed policy would catch it.
> 
> I agree on the DoS, though, at first, I didn't consider it a "bug" because
> there isn't any open mode that can prevent the DoS in this case.
> If you want to avoid it, you must avoid other-users-writable directories
> at all. So, It think that, if you are using a sticky directory, it's
> intended behavior to let someone else "lock" you.

Right.  There's a worse DoS I had overlooked, though: flock(1) can also
be made to create and/or lock another file (maybe in another directory).
Perhaps adding O_NOFOLLOW (alongside flock(1)'s existing use of
O_NOCTTY) would be a good idea, even though uses in a directory writable
by someone else are inherently risky anyway.

> But maybe many flock(1) users are not aware of the issue and so, sometimes,
> it can be unintended.
> I didn't consider privilege escalation as an issue because I always
> looked at flock(1) under the assumption that the lockfile is never actually
> read or modified in any way and so it shouldn't make too much difference if
> it's an already existing regular file or a symlink etc.
> Am I missing something?

You made a good point, but yes: O_CREAT will follow a dangling symlink
and there are cases where creating an empty file of an attacker-chosen
pathname may allow for privilege escalation.  For example, crontab(1)
man page on RHEL7 says regarding /etc/cron.allow and /etc/cron.deny:

"If neither of these files exists, only the super user is allowed to
use cron."

In that configuration, simply creating empty /etc/cron.deny grants
access to crontab(1) to all users.  As user:

$ crontab -l
You (solar) are not allowed to use this program (crontab)
See crontab(1) for more information
$ ln -s /etc/cron.deny /tmp/lockfile

As root:

# sysctl -w fs.protected_symlinks=0
fs.protected_symlinks = 0
# flock /tmp/lockfile -c echo

As user:

$ crontab -l
no crontab for solar

There may also be side-effects on open of device files (the best known
example is rewinding a tape), and we won't avoid those by retrying
without O_CREAT|O_EXCL.  O_NOFOLLOW will help against symlinks to device
files, but not against hard links (if on same device).  The kernel's
symlink and hardlink protections help, but in this sub-thread we were
discussing detecting userspace software issues without waiting for an
attack.  Things like this fit David Laight's point well: programs trying
to make risky (mis)uses less risky sometimes also avoid being detected
by our proposed policy.  That's life.

Alexander


Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories

2017-11-30 Thread Solar Designer
On Thu, Nov 30, 2017 at 04:53:06PM +, David Laight wrote:
> From: Salvatore Mesoraca
> > if a program tries to open a file, in a sticky directory,
> > with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> > This feature allows to detect and potentially block programs that
> > act this way, it can be used to find vulnerabilities (like those
> > prevented by patch #1) and to do policy enforcement.

> I presume the 'vulnerabilities' are related to symlinks being created
> just before the open?

That's one way to exploit them.  We already have a sysctl to try and
block that specific attack, and we're considering adding more, for other
attacks.  But we'd also like an easy way to learn of the vulnerabilities
without anyone trying to exploit them yet, hence this patch.

> Trouble is this change breaks a lot of general use of /tmp.

That's general misuse of /tmp.  Things like "command > /tmp/file"
without having pre-created the file with O_EXCL e.g. by mktemp(1).

> I always assumed that code that cared would use O_EXCL and
> everything else wasn't worth subverting.

Opinions will vary on whether it's worth subverting or not, and someone
may reasonably want to configure this differently on different systems.

> I found code in vi (and elsewhere) that subverted these checks
> by opening with O_WRONLY if stat() showed the file existed and
> O_CREAT|O_EXCL if it didn't.

Yes, such misuses of /tmp and the like by use of those programs - where
the potential consequences would often be less severe (if your example
above is correct, it sounds like it'd be data spoofing but not
overwriting another file over a link) - could remain unnoticed.

> I'm pretty sure that traditionally a lot of these opens were done
> with O_CREAT|O_TRUNC.
> Implementing that as unlink() followed by a create would stop
> 'random' (ok all) symlinks being followed.

That's racy.  We have O_EXCL, and it must be used along with O_CREAT
when the directory is untrusted.  (If it were only about symlinks, we
also already have O_NOFOLLOW.)

> Overall I'm pretty sure this change will break things badly somewhere.

Sure.  We want it to visibly break what was subtly broken, so that the
breakage can be seen and corrected without an attempted attack.

Alexander


Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories

2017-11-30 Thread Solar Designer
On Thu, Nov 30, 2017 at 04:53:06PM +, David Laight wrote:
> From: Salvatore Mesoraca
> > if a program tries to open a file, in a sticky directory,
> > with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> > This feature allows to detect and potentially block programs that
> > act this way, it can be used to find vulnerabilities (like those
> > prevented by patch #1) and to do policy enforcement.

> I presume the 'vulnerabilities' are related to symlinks being created
> just before the open?

That's one way to exploit them.  We already have a sysctl to try and
block that specific attack, and we're considering adding more, for other
attacks.  But we'd also like an easy way to learn of the vulnerabilities
without anyone trying to exploit them yet, hence this patch.

> Trouble is this change breaks a lot of general use of /tmp.

That's general misuse of /tmp.  Things like "command > /tmp/file"
without having pre-created the file with O_EXCL e.g. by mktemp(1).

> I always assumed that code that cared would use O_EXCL and
> everything else wasn't worth subverting.

Opinions will vary on whether it's worth subverting or not, and someone
may reasonably want to configure this differently on different systems.

> I found code in vi (and elsewhere) that subverted these checks
> by opening with O_WRONLY if stat() showed the file existed and
> O_CREAT|O_EXCL if it didn't.

Yes, such misuses of /tmp and the like by use of those programs - where
the potential consequences would often be less severe (if your example
above is correct, it sounds like it'd be data spoofing but not
overwriting another file over a link) - could remain unnoticed.

> I'm pretty sure that traditionally a lot of these opens were done
> with O_CREAT|O_TRUNC.
> Implementing that as unlink() followed by a create would stop
> 'random' (ok all) symlinks being followed.

That's racy.  We have O_EXCL, and it must be used along with O_CREAT
when the directory is untrusted.  (If it were only about symlinks, we
also already have O_NOFOLLOW.)

> Overall I'm pretty sure this change will break things badly somewhere.

Sure.  We want it to visibly break what was subtly broken, so that the
breakage can be seen and corrected without an attempted attack.

Alexander


Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories

2017-11-30 Thread Solar Designer
Replying to Salvatore and Ian at once, and CC'ing H. Peter Anvin and
Karel Zak for util-linux flock(1).

On Thu, Nov 30, 2017 at 02:57:06PM +, Ian Campbell wrote:
> On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote:
> > 2017-11-27 1:26 GMT+01:00 Solar Designer <so...@openwall.com>:
> > > Why would "shared lock files based on flock()" need O_CREAT without
> > > O_EXCL?  Where specifically are they currently used that way?
> > 
> > I don't think that they *need* to act like this, but this is how
> > util-linux's flock(1) currently works.

Oh, but you never referred to flock(1) in there, so I read flock() as
implying flock(2).  I think util-linux's flock(1) is relatively obscure,
and as far as I can tell it isn't documented anywhere whether it may be
used on untrusted lock file directories or not.  A revision of the
flock(1) man page seen on RHEL7 gives really weird examples using /tmp.
The current util-linux-2.31/sys-utils/flock.1 is similar.  strace'ing
those examples, I see flock(1) literally uses the /tmp directory itself
(not a file inside) as the lock, which surprisingly appears to work.
Checking the util-linux tree, it looks like this was added as a feature.
I suppose the good news is this doesn't appear to allow for worse than a
DoS against the script using flock(1) (since a different user can also
take that lock), unless any of the upper level directories are also
untrusted.  The man page as seen at https://linux.die.net/man/1/flock
currently only lists a more complex example with /var/lock/mylockfile,
where flock(1) itself is asked to access it via a pre-opened fd.
Permissions on /var/lock vary between distros (e.g., a RHEL6'ish system
I just checked has it as 775 root:lock, whereas a RHEL7'ish has it as
symlink to ../run/lock, which is 755 root:root).  Anyway, these are just
examples.  The reality is people will use flock(1) in various directories,
some of them trusted and some not.  If our proposed policy can detect and
optionally break some uses in untrusted directories, that's good since
such uses are currently unsafe (see below).

> > And it doesn't seem an unreasonable behavior from their perspective,
> 
> I thought that too, specifically I reasoned that using O_EXCL would
> defeat the purpose of the _shared_ flock(), wouldn't it?

No.  O_EXCL means the attempt to O_CREAT will fail, at which point the
program can retry without both of these flags.  (The actual lock is
obtained later via flock(2) or fcntl(2) anyway.)  In fact, flock(1)
already does something similar, as tested using the very first example
from the man page on a RHEL7'ish system:

$ strace flock /tmp -c cat
[...]
open("/tmp", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = -1 EISDIR (Is a directory)
open("/tmp", O_RDONLY|O_NOCTTY) = 3
flock(3, LOCK_EX)   = 0

As you can see, when O_CREAT failed, the program retried without that
flag.  It could just as easily have tried O_CREAT|O_EXCL, then dropped
both at once.  Testing with a lock file (rather than lock directory):

$ strace flock /tmp/lockfile -c cat
[...]
open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX)   = 0

This use of flock(1) would be a worse vulnerability, not limited to DoS
against the script itself but also allowing for privilege escalation
unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
would help (reduce the impact back to DoS against itself), and would
require that the retry logic (like what is seen in the lock directory
example above) be present.

On Thu, Nov 30, 2017 at 03:39:48PM +0100, Salvatore Mesoraca wrote:
> so maybe other programs do that too.

Maybe, and they would be similarly vulnerable if they do that in
untrusted directories, and they would also need to be fixed.

A subtle case is when something like this is technically done in a
directory writable by someone else, but is not necessarily crossing
what the program's author or the sysadmin recognize as a privilege
boundary.  Maybe they have accepted that this one pseudo-user can
attack that other one anyway, such as because under a certain daemon's
privsep model the would-be attacking pseudo-user is necessarily more
privileged anyway.  This is why we shouldn't by default extend this
policy to all directories writable by someone else, but instead we
consider introducing a sysctl with varying policy levels - initially
focusing on sticky directories writable by someone else and only later
optionally extending to non-sticky directories writable by someone else.
The latter mode would have even greater focus on development and
debugging rather than on production use, although I imagine that
specific systems could eventually afford it in production as well.

> I was citing that case just to make it clear that, if someone gets
> a warning because of flock(1), they shouldn't be worried about 

Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories

2017-11-30 Thread Solar Designer
Replying to Salvatore and Ian at once, and CC'ing H. Peter Anvin and
Karel Zak for util-linux flock(1).

On Thu, Nov 30, 2017 at 02:57:06PM +, Ian Campbell wrote:
> On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote:
> > 2017-11-27 1:26 GMT+01:00 Solar Designer :
> > > Why would "shared lock files based on flock()" need O_CREAT without
> > > O_EXCL?  Where specifically are they currently used that way?
> > 
> > I don't think that they *need* to act like this, but this is how
> > util-linux's flock(1) currently works.

Oh, but you never referred to flock(1) in there, so I read flock() as
implying flock(2).  I think util-linux's flock(1) is relatively obscure,
and as far as I can tell it isn't documented anywhere whether it may be
used on untrusted lock file directories or not.  A revision of the
flock(1) man page seen on RHEL7 gives really weird examples using /tmp.
The current util-linux-2.31/sys-utils/flock.1 is similar.  strace'ing
those examples, I see flock(1) literally uses the /tmp directory itself
(not a file inside) as the lock, which surprisingly appears to work.
Checking the util-linux tree, it looks like this was added as a feature.
I suppose the good news is this doesn't appear to allow for worse than a
DoS against the script using flock(1) (since a different user can also
take that lock), unless any of the upper level directories are also
untrusted.  The man page as seen at https://linux.die.net/man/1/flock
currently only lists a more complex example with /var/lock/mylockfile,
where flock(1) itself is asked to access it via a pre-opened fd.
Permissions on /var/lock vary between distros (e.g., a RHEL6'ish system
I just checked has it as 775 root:lock, whereas a RHEL7'ish has it as
symlink to ../run/lock, which is 755 root:root).  Anyway, these are just
examples.  The reality is people will use flock(1) in various directories,
some of them trusted and some not.  If our proposed policy can detect and
optionally break some uses in untrusted directories, that's good since
such uses are currently unsafe (see below).

> > And it doesn't seem an unreasonable behavior from their perspective,
> 
> I thought that too, specifically I reasoned that using O_EXCL would
> defeat the purpose of the _shared_ flock(), wouldn't it?

No.  O_EXCL means the attempt to O_CREAT will fail, at which point the
program can retry without both of these flags.  (The actual lock is
obtained later via flock(2) or fcntl(2) anyway.)  In fact, flock(1)
already does something similar, as tested using the very first example
from the man page on a RHEL7'ish system:

$ strace flock /tmp -c cat
[...]
open("/tmp", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = -1 EISDIR (Is a directory)
open("/tmp", O_RDONLY|O_NOCTTY) = 3
flock(3, LOCK_EX)   = 0

As you can see, when O_CREAT failed, the program retried without that
flag.  It could just as easily have tried O_CREAT|O_EXCL, then dropped
both at once.  Testing with a lock file (rather than lock directory):

$ strace flock /tmp/lockfile -c cat
[...]
open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX)   = 0

This use of flock(1) would be a worse vulnerability, not limited to DoS
against the script itself but also allowing for privilege escalation
unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
would help (reduce the impact back to DoS against itself), and would
require that the retry logic (like what is seen in the lock directory
example above) be present.

On Thu, Nov 30, 2017 at 03:39:48PM +0100, Salvatore Mesoraca wrote:
> so maybe other programs do that too.

Maybe, and they would be similarly vulnerable if they do that in
untrusted directories, and they would also need to be fixed.

A subtle case is when something like this is technically done in a
directory writable by someone else, but is not necessarily crossing
what the program's author or the sysadmin recognize as a privilege
boundary.  Maybe they have accepted that this one pseudo-user can
attack that other one anyway, such as because under a certain daemon's
privsep model the would-be attacking pseudo-user is necessarily more
privileged anyway.  This is why we shouldn't by default extend this
policy to all directories writable by someone else, but instead we
consider introducing a sysctl with varying policy levels - initially
focusing on sticky directories writable by someone else and only later
optionally extending to non-sticky directories writable by someone else.
The latter mode would have even greater focus on development and
debugging rather than on production use, although I imagine that
specific systems could eventually afford it in production as well.

> I was citing that case just to make it clear that, if someone gets
> a warning because of flock(1), they shouldn't be worried about it.

Checking the uti

Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories

2017-11-26 Thread Solar Designer
On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> 2017-11-24 11:53 GMT+01:00 David Laight :
> > From: Alan Cox
> >> Sent: 22 November 2017 16:52
> >>
> >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca 
> >>  wrote:
> >>
> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> >> > group writable directories, even if the file doesn't exist yet.
> >> > With few exceptions (e.g. shared lock files based on flock())

Why would "shared lock files based on flock()" need O_CREAT without
O_EXCL?  Where specifically are they currently used that way?  My guess
would be a group-writable mail spool (that would be fcntl() locks on
Linux now, but it's the same thing for the purpose of this discussion),
but even then I don't see a need for such open flags.  If a program does
that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
retry without these to open the existing file, then flock() either way).

> >> Enough exceptions to make it a bad idea.

Maybe, but as Salvatore points out we're considering this for rather
limited use - easy opt-in policy enforcement during software development
and auditing (years ago, I was using an LKM for this purpose, which
caught some issues in Postfix that were promptly patched), and maybe on
specific production systems where the sysadmins know what they're doing.

> >> Firstly if you care this much *stop* having shared writable directories.
> >> We have namespaces, you don't need them. You can give every user their
> >> own /tmp etc.

This is good advice (and I've been doing similar with pam_mktemp, prior
to namespaces), but it's not exactly for the same use case.

Also, there are shared writable directories that have to remain shared
unless more things are reworked.  For example, mail spools and crontab
directories, which could be avoided by having the corresponding files in
user homes instead (and it's been done), but that's also a related risk
(a privileged process accessing a directory writable by a user, even if
accessing only for reading, lowering the risk impact).

> > Looks like a very bad idea to me as well.
> >
> > Doesn't this stop all shell redirects into a shared /tmp ?

The above sounds like it'd be something bad - but it's actually just
right for an opt-in policy like this.  A command like "program >
/tmp/file" is generally unsafe (unless the file was pre-created safely,
see below) and should be avoided.  The fact that a lot of documentation
gives commands of this kind only emphasizes the need for easy policy
enforcement against such misuses of /tmp.

Ideally, we'd stop most "shell redirects into a shared /tmp" - all but
those into files correctly pre-created with mktemp(1) and the like.

Technically, we could achieve similar by allowing O_CREAT without O_EXCL
if the file exists _and_ has the same owner as our current fsuid or the
owner is root.  Most scripts that use mktemp(1) then use that file
without switching privileges, so they'll continue working.  If there's
an exception to that on a system where I'd configure a policy like
what's discussed here, I'd rather learn of it and be interested in
looking at that unusual script.  That script would be taking a risk by
making one (pseudo-)user (or root) write into a file in /tmp created
(even if initially safely) by another pseudo-user (not root), thus
letting the latter pseudo-user attack the former (or root).

Ditto for programs using mkstemp(3), but then somehow going through the
pathname rather than the fd.  That's not ideal, but it's sometimes safe
and sometimes makes sense (such as when they need to exec other programs
only accepting a pathname), so the same exception and approach applies.

> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> > files - they'll all stop working.

That would be great.  Unfortunately (in this context), you've identified
exceptions, where programs try to be smart but are not necessarily safe.

It's beneficial to be able to easily stop at least some misuses of /tmp
and the like, even if we can't stop all.

> If some program does such a thing, that's a potential vulnerability.

Exactly.

> With "protected_hardlinks" you are, in most cases, safe.

And "protected_symlinks", and not in terms of data spoofing attacks via
FIFOs and regular files (the initial focus of this patchset).

> But, still, that program has a bug and having this feature enabled will
> help you notice it soon.
> For that matter, I'm using this patch on my system and I don't have any
> program behaving like this.
> 
> > If there are some directories where you need to force O_EXCL you
> > need to make it a property of the directory, not the kernel.
> >
> > David

Good idea, but this would need to be checked and enforced by the kernel
anyhow, so it's a matter of coarse vs. fine policy.  Coarse is much
easier to implement and to start using, albeit perhaps mostly not by
general-purpose distros, but by 

Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories

2017-11-26 Thread Solar Designer
On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> 2017-11-24 11:53 GMT+01:00 David Laight :
> > From: Alan Cox
> >> Sent: 22 November 2017 16:52
> >>
> >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca 
> >>  wrote:
> >>
> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> >> > group writable directories, even if the file doesn't exist yet.
> >> > With few exceptions (e.g. shared lock files based on flock())

Why would "shared lock files based on flock()" need O_CREAT without
O_EXCL?  Where specifically are they currently used that way?  My guess
would be a group-writable mail spool (that would be fcntl() locks on
Linux now, but it's the same thing for the purpose of this discussion),
but even then I don't see a need for such open flags.  If a program does
that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
retry without these to open the existing file, then flock() either way).

> >> Enough exceptions to make it a bad idea.

Maybe, but as Salvatore points out we're considering this for rather
limited use - easy opt-in policy enforcement during software development
and auditing (years ago, I was using an LKM for this purpose, which
caught some issues in Postfix that were promptly patched), and maybe on
specific production systems where the sysadmins know what they're doing.

> >> Firstly if you care this much *stop* having shared writable directories.
> >> We have namespaces, you don't need them. You can give every user their
> >> own /tmp etc.

This is good advice (and I've been doing similar with pam_mktemp, prior
to namespaces), but it's not exactly for the same use case.

Also, there are shared writable directories that have to remain shared
unless more things are reworked.  For example, mail spools and crontab
directories, which could be avoided by having the corresponding files in
user homes instead (and it's been done), but that's also a related risk
(a privileged process accessing a directory writable by a user, even if
accessing only for reading, lowering the risk impact).

> > Looks like a very bad idea to me as well.
> >
> > Doesn't this stop all shell redirects into a shared /tmp ?

The above sounds like it'd be something bad - but it's actually just
right for an opt-in policy like this.  A command like "program >
/tmp/file" is generally unsafe (unless the file was pre-created safely,
see below) and should be avoided.  The fact that a lot of documentation
gives commands of this kind only emphasizes the need for easy policy
enforcement against such misuses of /tmp.

Ideally, we'd stop most "shell redirects into a shared /tmp" - all but
those into files correctly pre-created with mktemp(1) and the like.

Technically, we could achieve similar by allowing O_CREAT without O_EXCL
if the file exists _and_ has the same owner as our current fsuid or the
owner is root.  Most scripts that use mktemp(1) then use that file
without switching privileges, so they'll continue working.  If there's
an exception to that on a system where I'd configure a policy like
what's discussed here, I'd rather learn of it and be interested in
looking at that unusual script.  That script would be taking a risk by
making one (pseudo-)user (or root) write into a file in /tmp created
(even if initially safely) by another pseudo-user (not root), thus
letting the latter pseudo-user attack the former (or root).

Ditto for programs using mkstemp(3), but then somehow going through the
pathname rather than the fd.  That's not ideal, but it's sometimes safe
and sometimes makes sense (such as when they need to exec other programs
only accepting a pathname), so the same exception and approach applies.

> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> > files - they'll all stop working.

That would be great.  Unfortunately (in this context), you've identified
exceptions, where programs try to be smart but are not necessarily safe.

It's beneficial to be able to easily stop at least some misuses of /tmp
and the like, even if we can't stop all.

> If some program does such a thing, that's a potential vulnerability.

Exactly.

> With "protected_hardlinks" you are, in most cases, safe.

And "protected_symlinks", and not in terms of data spoofing attacks via
FIFOs and regular files (the initial focus of this patchset).

> But, still, that program has a bug and having this feature enabled will
> help you notice it soon.
> For that matter, I'm using this patch on my system and I don't have any
> program behaving like this.
> 
> > If there are some directories where you need to force O_EXCL you
> > need to make it a property of the directory, not the kernel.
> >
> > David

Good idea, but this would need to be checked and enforced by the kernel
anyhow, so it's a matter of coarse vs. fine policy.  Coarse is much
easier to implement and to start using, albeit perhaps mostly not by
general-purpose distros, but by a software developer/auditor, an
adventurous 

Re: [kernel-hardening] [PATCH v2 0/5] stackprotector: ascii armor the stack canary

2017-09-20 Thread Solar Designer
On Wed, Sep 20, 2017 at 01:18:04PM +0200, Yann Droneaud wrote:
> Le mardi 19 septembre 2017 ?? 19:16 +0200, Solar Designer a ??crit :
> >
> > We could put/require a NUL in the middle of the canary,
> > but with the full canary being only 64-bit at most that would also
> > make some attacks easier.
> 
> Are you suggesting to randomly select which byte to set to 0 in each
> canary ?

Definitely not.  That's only 8 different possibilities per canary, and
the weakest one will affect exploitability in each scenario.  So that
would be a fairly clear change to the worse.

I suggest that we make no further changes at this time, unless someone
comes up with an idea that would clearly hurt exploitation more than it
helps exploitation, overall across different scenarios.

Alexander


Re: [kernel-hardening] [PATCH v2 0/5] stackprotector: ascii armor the stack canary

2017-09-20 Thread Solar Designer
On Wed, Sep 20, 2017 at 01:18:04PM +0200, Yann Droneaud wrote:
> Le mardi 19 septembre 2017 ?? 19:16 +0200, Solar Designer a ??crit :
> >
> > We could put/require a NUL in the middle of the canary,
> > but with the full canary being only 64-bit at most that would also
> > make some attacks easier.
> 
> Are you suggesting to randomly select which byte to set to 0 in each
> canary ?

Definitely not.  That's only 8 different possibilities per canary, and
the weakest one will affect exploitability in each scenario.  So that
would be a fairly clear change to the worse.

I suggest that we make no further changes at this time, unless someone
comes up with an idea that would clearly hurt exploitation more than it
helps exploitation, overall across different scenarios.

Alexander


Re: [kernel-hardening] [PATCH v2 0/5] stackprotector: ascii armor the stack canary

2017-09-19 Thread Solar Designer
On Wed, May 24, 2017 at 11:57:46AM -0400, r...@redhat.com wrote:
> Zero out the first byte of the stack canary value on 64 bit systems,
> in order to mitigate unterminated C string overflows.
> 
> The null byte both prevents C string functions from reading the
> canary, and from writing it if the canary value were guessed or
> obtained through some other means.
> 
> Reducing the entropy by 8 bits is acceptable on 64-bit systems,
> which will still have 56 bits of entropy left, but not on 32
> bit systems, so the "ascii armor" canary is only implemented on
> 64-bit systems.
> 
> Inspired by the "ascii armor" code in execshield and Daniel Micay's
> linux-hardened tree.
> 
> Also see https://github.com/thestinger/linux-hardened/

Brad trolls us all lightly with this trivia question:

https://twitter.com/grsecurity/status/905246423591084033

"For #trivia can you describe one scenario where this change actually
helps exploitation using similar C string funcs?"

I suppose the expected answer is:

The change helps exploitation when the overwriting string ends just
before the canary.  Its NUL overwriting the NUL byte in the canary would
go undetected.  Before this change, there would be a 255/256 chance of
detection.

I hope this was considered.  The change might still be a good tradeoff,
or it might not, depending on which scenarios are more likely (leak of
canary value or the string required in an exploit ending at that exact
byte location), and we probably lack such statistics.

I am not proposing to revert the change.  I had actually contemplated
speaking up when this was discussed, but did not for lack of a better
suggestion.  We could put/require a NUL in the middle of the canary, but
with the full canary being only 64-bit at most that would also make some
attacks easier.

So this is JFYI.  No action needed on it, I think.

Alexander


Re: [kernel-hardening] [PATCH v2 0/5] stackprotector: ascii armor the stack canary

2017-09-19 Thread Solar Designer
On Wed, May 24, 2017 at 11:57:46AM -0400, r...@redhat.com wrote:
> Zero out the first byte of the stack canary value on 64 bit systems,
> in order to mitigate unterminated C string overflows.
> 
> The null byte both prevents C string functions from reading the
> canary, and from writing it if the canary value were guessed or
> obtained through some other means.
> 
> Reducing the entropy by 8 bits is acceptable on 64-bit systems,
> which will still have 56 bits of entropy left, but not on 32
> bit systems, so the "ascii armor" canary is only implemented on
> 64-bit systems.
> 
> Inspired by the "ascii armor" code in execshield and Daniel Micay's
> linux-hardened tree.
> 
> Also see https://github.com/thestinger/linux-hardened/

Brad trolls us all lightly with this trivia question:

https://twitter.com/grsecurity/status/905246423591084033

"For #trivia can you describe one scenario where this change actually
helps exploitation using similar C string funcs?"

I suppose the expected answer is:

The change helps exploitation when the overwriting string ends just
before the canary.  Its NUL overwriting the NUL byte in the canary would
go undetected.  Before this change, there would be a 255/256 chance of
detection.

I hope this was considered.  The change might still be a good tradeoff,
or it might not, depending on which scenarios are more likely (leak of
canary value or the string required in an exploit ending at that exact
byte location), and we probably lack such statistics.

I am not proposing to revert the change.  I had actually contemplated
speaking up when this was discussed, but did not for lack of a better
suggestion.  We could put/require a NUL in the middle of the canary, but
with the full canary being only 64-bit at most that would also make some
attacks easier.

So this is JFYI.  No action needed on it, I think.

Alexander


Re: [RFC] Restrict writes into untrusted FIFOs and regular files

2017-09-19 Thread Solar Designer
On Tue, Sep 19, 2017 at 06:06:15PM +0200, Salvatore Mesoraca wrote:
> 2017-09-19 2:37 GMT+02:00 Solar Designer <so...@openwall.com>:
> > On Mon, Sep 18, 2017 at 02:00:50PM -0700, Kees Cook wrote:
> >> On Fri, Sep 15, 2017 at 1:43 AM, Salvatore Mesoraca 
> >> <s.mesorac...@gmail.com> wrote:
> >> > +protected_regular_files:
> >> > +
> >> > +This protection is similar to protected_fifos, but it
> >> > +avoids writes to an attacker-controlled regular file, where program
> >> > +expected to create one.
> >> > +
> >> > +When set to "0", regular files writing is unrestricted.
> >> > +
> >> > +When set to "1" don't allow O_CREAT open on regular files that we
> >> > +don't own in world writable sticky directories, unless they are
> >> > +owned by the owner of the directory.
[...]
> > Although this is sufficient against attacks (if the kernel's check for
> > these properties is not racy; I don't know if it is), for the policy
> > enforcement use case and reason we might want to support a simpler mode
> > where O_CREAT without O_EXCL would be disallowed in sticky directories
> > (only world writable? or also writable by anyone other than us? - e.g.,
> > it'd catch some unsafe uses of mail spools) regardless of whether a
> > file of that name already exists or not.  Maybe extra settings:
> >
> > When set to "2" also don't allow O_CREAT open without O_EXCL in
> > world-writable sticky directories (even if the files don't already
> > exist, for consistent policy enforcement)
> >
> > When set to "3" also don't allow O_CREAT open on regular files that we
> > don't own in sticky directories writable by anyone else, unless the
> > files are owned by the owner of the directory.
> >
> > When set to "4" also don't allow O_CREAT open without O_EXCL in
> > sticky directories writable by anyone else (even if the files don't
> > already exist, for consistent policy enforcement)
> >
> > Or maybe "2" and "4" should be a separate knob, so that "3" could be
> > used without the policy enforcement aspect of "2", although enabling
> > this separate knob at the highest level would make protected_regular
> > redundant.
> >
> > I could envision further levels for non-sticky world-writable and
> > writable-by-others directories, and even for unsafe writes without
> > O_CREAT and unsafe reads, but then the protected_regular name would
> > become even more misleading as without O_CREAT the program could
> > actually intend to work with a non-regular file.
> >
> > Let's avoid further scope creep for now, but have this in mind.  As I
> > had mentioned in another thread on kernel-hardening, policy enforcement
> > like this implemented in a kernel module helped me find weaknesses in
> > old Postfix' privsep implementation, which were promptly patched (that
> > was many years ago).  Having this generally available and easy to enable
> > could result in more findings like this by more people.
> >
> > A setting similar to "3" above should probably also exist for
> > protected_fifos (would be "2" there).
> 
> I think I could add "3" to both protected_fifos and protected_regulars
> actually using "2" for both. And then add another sysctl for modes
> "2" and "4" for both fifos and regular files.

Sounds good to me.  The third sysctl (or several) could be introduced
with a separate patch, focusing on file access safety policy warnings
and enforcement in general rather than on any specific file types.

Alexander


Re: [RFC] Restrict writes into untrusted FIFOs and regular files

2017-09-19 Thread Solar Designer
On Tue, Sep 19, 2017 at 06:06:15PM +0200, Salvatore Mesoraca wrote:
> 2017-09-19 2:37 GMT+02:00 Solar Designer :
> > On Mon, Sep 18, 2017 at 02:00:50PM -0700, Kees Cook wrote:
> >> On Fri, Sep 15, 2017 at 1:43 AM, Salvatore Mesoraca 
> >>  wrote:
> >> > +protected_regular_files:
> >> > +
> >> > +This protection is similar to protected_fifos, but it
> >> > +avoids writes to an attacker-controlled regular file, where program
> >> > +expected to create one.
> >> > +
> >> > +When set to "0", regular files writing is unrestricted.
> >> > +
> >> > +When set to "1" don't allow O_CREAT open on regular files that we
> >> > +don't own in world writable sticky directories, unless they are
> >> > +owned by the owner of the directory.
[...]
> > Although this is sufficient against attacks (if the kernel's check for
> > these properties is not racy; I don't know if it is), for the policy
> > enforcement use case and reason we might want to support a simpler mode
> > where O_CREAT without O_EXCL would be disallowed in sticky directories
> > (only world writable? or also writable by anyone other than us? - e.g.,
> > it'd catch some unsafe uses of mail spools) regardless of whether a
> > file of that name already exists or not.  Maybe extra settings:
> >
> > When set to "2" also don't allow O_CREAT open without O_EXCL in
> > world-writable sticky directories (even if the files don't already
> > exist, for consistent policy enforcement)
> >
> > When set to "3" also don't allow O_CREAT open on regular files that we
> > don't own in sticky directories writable by anyone else, unless the
> > files are owned by the owner of the directory.
> >
> > When set to "4" also don't allow O_CREAT open without O_EXCL in
> > sticky directories writable by anyone else (even if the files don't
> > already exist, for consistent policy enforcement)
> >
> > Or maybe "2" and "4" should be a separate knob, so that "3" could be
> > used without the policy enforcement aspect of "2", although enabling
> > this separate knob at the highest level would make protected_regular
> > redundant.
> >
> > I could envision further levels for non-sticky world-writable and
> > writable-by-others directories, and even for unsafe writes without
> > O_CREAT and unsafe reads, but then the protected_regular name would
> > become even more misleading as without O_CREAT the program could
> > actually intend to work with a non-regular file.
> >
> > Let's avoid further scope creep for now, but have this in mind.  As I
> > had mentioned in another thread on kernel-hardening, policy enforcement
> > like this implemented in a kernel module helped me find weaknesses in
> > old Postfix' privsep implementation, which were promptly patched (that
> > was many years ago).  Having this generally available and easy to enable
> > could result in more findings like this by more people.
> >
> > A setting similar to "3" above should probably also exist for
> > protected_fifos (would be "2" there).
> 
> I think I could add "3" to both protected_fifos and protected_regulars
> actually using "2" for both. And then add another sysctl for modes
> "2" and "4" for both fifos and regular files.

Sounds good to me.  The third sysctl (or several) could be introduced
with a separate patch, focusing on file access safety policy warnings
and enforcement in general rather than on any specific file types.

Alexander


Re: [RFC] Restrict writes into untrusted FIFOs and regular files

2017-09-18 Thread Solar Designer
On Mon, Sep 18, 2017 at 02:00:50PM -0700, Kees Cook wrote:
> On Fri, Sep 15, 2017 at 1:43 AM, Salvatore Mesoraca  
> wrote:
> > The purpose is to make data spoofing attacks harder.
> 
> Do you have any examples of attacks (CVEs, blog posts, etc) that you
> could link to in this commit?

I doubt there are (m)any examples of attacks and blog posts on this,
because most systems didn't have similar (sym)link restrictions until
recently and those attacks are simpler.  As to CVEs, I expect that a
large subset of CVEs assigned to "improper usage of temporary files" or
"symlink attacks" or such are about vulnerabilities that are also
exploitable into data spoofing, even if perhaps with different (and
often lesser) impact than they are when exploited via (sym)links.

IIRC, the attacks were first pointed out on LKML circa 1997 as an
argument against (sym)link restrictions - that they're not enough on
their own against some attacks anyway, because FIFOs and even regular
files could also be used for those.  My response was to include FIFO
restrictions (on top of (sym)link restrictions) into -ow patches, and
now the same became relevant for mainline (as it has the optional
(sym)link restrictions now).

One thing that has changed since is the addition of inotify, which
probably made use of regular files for similar attacks much more
practical.  This is one reason why we might also protect against
maybe-maliciously pre-created regular files now.

Another reason is that we, or at least some users/sysadmins, may want to
discourage misuse of /tmp, /dev/shm, and similar directories in general.
Those directories are supposed to only be accessed through appropriate
library interfaces, or using specific known-safe procedures.  Trying to
O_CREAT a file in there without O_EXCL is almost never right.  It makes
sense to provide an easy way to disallow that as policy enforcement.

> bike shed: I think "_files" is redundant, since we're already in proc/sys/fs/

+1

> > +protected_regular_files:
> > +
> > +This protection is similar to protected_fifos, but it
> > +avoids writes to an attacker-controlled regular file, where program
> > +expected to create one.
> > +
> > +When set to "0", regular files writing is unrestricted.
> > +
> > +When set to "1" don't allow O_CREAT open on regular files that we
> > +don't own in world writable sticky directories, unless they are
> > +owned by the owner of the directory.
> > +
> > +This protection is based on the restrictions in Openwall.

While symlink/hardlink/FIFO restrictions were in -ow patches, the
regular file restriction was not - we're just inventing it now, even if
upon my suggestion.

Although this is sufficient against attacks (if the kernel's check for
these properties is not racy; I don't know if it is), for the policy
enforcement use case and reason we might want to support a simpler mode
where O_CREAT without O_EXCL would be disallowed in sticky directories
(only world writable? or also writable by anyone other than us? - e.g.,
it'd catch some unsafe uses of mail spools) regardless of whether a
file of that name already exists or not.  Maybe extra settings:

When set to "2" also don't allow O_CREAT open without O_EXCL in
world-writable sticky directories (even if the files don't already
exist, for consistent policy enforcement)

When set to "3" also don't allow O_CREAT open on regular files that we
don't own in sticky directories writable by anyone else, unless the
files are owned by the owner of the directory.

When set to "4" also don't allow O_CREAT open without O_EXCL in
sticky directories writable by anyone else (even if the files don't
already exist, for consistent policy enforcement)

Or maybe "2" and "4" should be a separate knob, so that "3" could be
used without the policy enforcement aspect of "2", although enabling
this separate knob at the highest level would make protected_regular
redundant.

I could envision further levels for non-sticky world-writable and
writable-by-others directories, and even for unsafe writes without
O_CREAT and unsafe reads, but then the protected_regular name would
become even more misleading as without O_CREAT the program could
actually intend to work with a non-regular file.

Let's avoid further scope creep for now, but have this in mind.  As I
had mentioned in another thread on kernel-hardening, policy enforcement
like this implemented in a kernel module helped me find weaknesses in
old Postfix' privsep implementation, which were promptly patched (that
was many years ago).  Having this generally available and easy to enable
could result in more findings like this by more people.

A setting similar to "3" above should probably also exist for
protected_fifos (would be "2" there).

> Otherwise, yeah, looks good! :)

+1

Alexander


Re: [RFC] Restrict writes into untrusted FIFOs and regular files

2017-09-18 Thread Solar Designer
On Mon, Sep 18, 2017 at 02:00:50PM -0700, Kees Cook wrote:
> On Fri, Sep 15, 2017 at 1:43 AM, Salvatore Mesoraca  
> wrote:
> > The purpose is to make data spoofing attacks harder.
> 
> Do you have any examples of attacks (CVEs, blog posts, etc) that you
> could link to in this commit?

I doubt there are (m)any examples of attacks and blog posts on this,
because most systems didn't have similar (sym)link restrictions until
recently and those attacks are simpler.  As to CVEs, I expect that a
large subset of CVEs assigned to "improper usage of temporary files" or
"symlink attacks" or such are about vulnerabilities that are also
exploitable into data spoofing, even if perhaps with different (and
often lesser) impact than they are when exploited via (sym)links.

IIRC, the attacks were first pointed out on LKML circa 1997 as an
argument against (sym)link restrictions - that they're not enough on
their own against some attacks anyway, because FIFOs and even regular
files could also be used for those.  My response was to include FIFO
restrictions (on top of (sym)link restrictions) into -ow patches, and
now the same became relevant for mainline (as it has the optional
(sym)link restrictions now).

One thing that has changed since is the addition of inotify, which
probably made use of regular files for similar attacks much more
practical.  This is one reason why we might also protect against
maybe-maliciously pre-created regular files now.

Another reason is that we, or at least some users/sysadmins, may want to
discourage misuse of /tmp, /dev/shm, and similar directories in general.
Those directories are supposed to only be accessed through appropriate
library interfaces, or using specific known-safe procedures.  Trying to
O_CREAT a file in there without O_EXCL is almost never right.  It makes
sense to provide an easy way to disallow that as policy enforcement.

> bike shed: I think "_files" is redundant, since we're already in proc/sys/fs/

+1

> > +protected_regular_files:
> > +
> > +This protection is similar to protected_fifos, but it
> > +avoids writes to an attacker-controlled regular file, where program
> > +expected to create one.
> > +
> > +When set to "0", regular files writing is unrestricted.
> > +
> > +When set to "1" don't allow O_CREAT open on regular files that we
> > +don't own in world writable sticky directories, unless they are
> > +owned by the owner of the directory.
> > +
> > +This protection is based on the restrictions in Openwall.

While symlink/hardlink/FIFO restrictions were in -ow patches, the
regular file restriction was not - we're just inventing it now, even if
upon my suggestion.

Although this is sufficient against attacks (if the kernel's check for
these properties is not racy; I don't know if it is), for the policy
enforcement use case and reason we might want to support a simpler mode
where O_CREAT without O_EXCL would be disallowed in sticky directories
(only world writable? or also writable by anyone other than us? - e.g.,
it'd catch some unsafe uses of mail spools) regardless of whether a
file of that name already exists or not.  Maybe extra settings:

When set to "2" also don't allow O_CREAT open without O_EXCL in
world-writable sticky directories (even if the files don't already
exist, for consistent policy enforcement)

When set to "3" also don't allow O_CREAT open on regular files that we
don't own in sticky directories writable by anyone else, unless the
files are owned by the owner of the directory.

When set to "4" also don't allow O_CREAT open without O_EXCL in
sticky directories writable by anyone else (even if the files don't
already exist, for consistent policy enforcement)

Or maybe "2" and "4" should be a separate knob, so that "3" could be
used without the policy enforcement aspect of "2", although enabling
this separate knob at the highest level would make protected_regular
redundant.

I could envision further levels for non-sticky world-writable and
writable-by-others directories, and even for unsafe writes without
O_CREAT and unsafe reads, but then the protected_regular name would
become even more misleading as without O_CREAT the program could
actually intend to work with a non-regular file.

Let's avoid further scope creep for now, but have this in mind.  As I
had mentioned in another thread on kernel-hardening, policy enforcement
like this implemented in a kernel module helped me find weaknesses in
old Postfix' privsep implementation, which were promptly patched (that
was many years ago).  Having this generally available and easy to enable
could result in more findings like this by more people.

A setting similar to "3" above should probably also exist for
protected_fifos (would be "2" there).

> Otherwise, yeah, looks good! :)

+1

Alexander


Re: [vs-plain] Re: [PATCH] mm: larger stack guard gap, between vmas

2017-07-05 Thread Solar Designer
Hi all,

On Tue, Jul 04, 2017 at 07:16:06PM -0600, kseifr...@redhat.com wrote:
> This issue occurs post stackguard patches correct? Fixing it sounds like
> this might go beyond hardening and into CVE territory.

Since this thread is public on LKML, as it should be, it's no longer
valid to be CC'ed to linux-distros, which is for handling of embargoed
issues only.  So please drop linux-distros from further CC's (I moved
linux-distros to Bcc on this reply, just so they know what happened).

If specific security issues are identified (such as with LibreOffice and
Java), then ideally those should be posted to oss-security as separate
reports.  I'd appreciate it if anyone takes care of that (regardless of
CVE worthiness).

In fact, I already mentioned this thread in:

http://www.openwall.com/lists/oss-security/2017/07/05/11

Thank you!

Alexander


Re: [vs-plain] Re: [PATCH] mm: larger stack guard gap, between vmas

2017-07-05 Thread Solar Designer
Hi all,

On Tue, Jul 04, 2017 at 07:16:06PM -0600, kseifr...@redhat.com wrote:
> This issue occurs post stackguard patches correct? Fixing it sounds like
> this might go beyond hardening and into CVE territory.

Since this thread is public on LKML, as it should be, it's no longer
valid to be CC'ed to linux-distros, which is for handling of embargoed
issues only.  So please drop linux-distros from further CC's (I moved
linux-distros to Bcc on this reply, just so they know what happened).

If specific security issues are identified (such as with LibreOffice and
Java), then ideally those should be posted to oss-security as separate
reports.  I'd appreciate it if anyone takes care of that (regardless of
CVE worthiness).

In fact, I already mentioned this thread in:

http://www.openwall.com/lists/oss-security/2017/07/05/11

Thank you!

Alexander


Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-23 Thread Solar Designer
> >>> On Mon, May 22, 2017 at 2:08 PM, Solar Designer <so...@openwall.com> 
> >>> wrote:
> >>> > On Mon, May 22, 2017 at 01:57:03PM +0200, Djalal Harouni wrote:
> >>> >> *) When modules_autoload_mode is set to (2), automatic module loading 
> >>> >> is
> >>> >> disabled for all. Once set, this value can not be changed.
> >>> >
> >>> > What purpose does this securelevel-like property ("Once set, this value
> >>> > can not be changed.") serve here?  I think this mode 2 is needed, but
> >>> > without this extra property, which is bypassable by e.g. explicitly
> >>> > loaded kernel modules anyway (and that's OK).

On Mon, May 22, 2017 at 04:07:56PM -0700, Kees Cook wrote:
> I'm on the fence. For modules_disabled and Yama, it was tied to
> CAP_SYS_ADMIN, basically designed to be a at-boot setting that could
> not later be undone by an attacker gaining that privilege, keeping
> them out of either kernel memory or existing user process memory.
> Here, it's CAP_SYS_MODULE... it's hard to imagine the situation where
> a CAP_SYS_MODULE-capable process could write to this sysctl but NOT
> issue direct modprobe requests, but it's _possible_ via crazy symlink
> games to trick capable processes into writing to sysctls. We've seen
> this multiple times before, and it's a way for attackers to turn a
> single privileged write into a privileged exec.

OK, tricking a process via crazy symlink games is finally a potentially
valid reason.  The question then becomes: are there perhaps so many
other important sysctl's, disk files, etc. (which the vulnerable capable
process could similarly be tricked into writing) so that specifically
resetting modules_autoload_mode isn't particularly lucrative?  I think
that the answer to that is usually yes.  Another related question: do we
really want to inconsistently single out a handful of sysctl's for this
kind of extra protection?  I think not.

I agree there are some other settings where being unable to reset them
makes sense, but I think this isn't one of those.

> I might turn the question around, though: why would we want to have it
> changeable at this setting?

Convenience for the sysadmin - being able to correct one's error (e.g.,
wrong order of shell commands), respond to new findings (thought module
autoloading was unneeded after some point, then found out some software
relies on it), change one's mind, reuse a system differently than
originally intended without a forced reboot.

> I'm fine leaving that piece off, either way.

I'm also fine with either decision.  I just thought I'd point out what
looked weird to me.

I think this is an important patch that should get in, but primarily
for modules_autoload_mode=1, which many distros could make the default
(and maybe the kernel eventually should?)

For modules_autoload_mode=2, we already seem to have the equivalent of
modprobe=/bin/true (or does it differ subtly, maybe in return values?),
which I already use at startup on a GPU box like this (preloading
modules so that the OpenCL backends wouldn't need the autoloading):

nvidia-smi
nvidia-modprobe -u -c=0
#modprobe nvidia_uvm
#modprobe fglrx

sysctl -w kernel.modprobe=/bin/true
sysctl -w kernel.hotplug=/bin/true

but it's good to also have this supported more explicitly and more
consistently through modules_autoload_mode=2 while we're at it.  So I
support having this mode as well.  I just question the need to have it
non-resettable.

Alexander


Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-23 Thread Solar Designer
> >>> On Mon, May 22, 2017 at 2:08 PM, Solar Designer  
> >>> wrote:
> >>> > On Mon, May 22, 2017 at 01:57:03PM +0200, Djalal Harouni wrote:
> >>> >> *) When modules_autoload_mode is set to (2), automatic module loading 
> >>> >> is
> >>> >> disabled for all. Once set, this value can not be changed.
> >>> >
> >>> > What purpose does this securelevel-like property ("Once set, this value
> >>> > can not be changed.") serve here?  I think this mode 2 is needed, but
> >>> > without this extra property, which is bypassable by e.g. explicitly
> >>> > loaded kernel modules anyway (and that's OK).

On Mon, May 22, 2017 at 04:07:56PM -0700, Kees Cook wrote:
> I'm on the fence. For modules_disabled and Yama, it was tied to
> CAP_SYS_ADMIN, basically designed to be a at-boot setting that could
> not later be undone by an attacker gaining that privilege, keeping
> them out of either kernel memory or existing user process memory.
> Here, it's CAP_SYS_MODULE... it's hard to imagine the situation where
> a CAP_SYS_MODULE-capable process could write to this sysctl but NOT
> issue direct modprobe requests, but it's _possible_ via crazy symlink
> games to trick capable processes into writing to sysctls. We've seen
> this multiple times before, and it's a way for attackers to turn a
> single privileged write into a privileged exec.

OK, tricking a process via crazy symlink games is finally a potentially
valid reason.  The question then becomes: are there perhaps so many
other important sysctl's, disk files, etc. (which the vulnerable capable
process could similarly be tricked into writing) so that specifically
resetting modules_autoload_mode isn't particularly lucrative?  I think
that the answer to that is usually yes.  Another related question: do we
really want to inconsistently single out a handful of sysctl's for this
kind of extra protection?  I think not.

I agree there are some other settings where being unable to reset them
makes sense, but I think this isn't one of those.

> I might turn the question around, though: why would we want to have it
> changeable at this setting?

Convenience for the sysadmin - being able to correct one's error (e.g.,
wrong order of shell commands), respond to new findings (thought module
autoloading was unneeded after some point, then found out some software
relies on it), change one's mind, reuse a system differently than
originally intended without a forced reboot.

> I'm fine leaving that piece off, either way.

I'm also fine with either decision.  I just thought I'd point out what
looked weird to me.

I think this is an important patch that should get in, but primarily
for modules_autoload_mode=1, which many distros could make the default
(and maybe the kernel eventually should?)

For modules_autoload_mode=2, we already seem to have the equivalent of
modprobe=/bin/true (or does it differ subtly, maybe in return values?),
which I already use at startup on a GPU box like this (preloading
modules so that the OpenCL backends wouldn't need the autoloading):

nvidia-smi
nvidia-modprobe -u -c=0
#modprobe nvidia_uvm
#modprobe fglrx

sysctl -w kernel.modprobe=/bin/true
sysctl -w kernel.hotplug=/bin/true

but it's good to also have this supported more explicitly and more
consistently through modules_autoload_mode=2 while we're at it.  So I
support having this mode as well.  I just question the need to have it
non-resettable.

Alexander


Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-22 Thread Solar Designer
On Mon, May 22, 2017 at 03:49:15PM +0200, Djalal Harouni wrote:
> On Mon, May 22, 2017 at 2:08 PM, Solar Designer <so...@openwall.com> wrote:
> > On Mon, May 22, 2017 at 01:57:03PM +0200, Djalal Harouni wrote:
> >> *) When modules_autoload_mode is set to (2), automatic module loading is
> >> disabled for all. Once set, this value can not be changed.
> >
> > What purpose does this securelevel-like property ("Once set, this value
> > can not be changed.") serve here?  I think this mode 2 is needed, but
> > without this extra property, which is bypassable by e.g. explicitly
> > loaded kernel modules anyway (and that's OK).
> 
> My reasoning about "Once set, this value can not be changed" is mainly for:
> 
> If you have some systems where modules are not updated for any given
> reason, then the only one who will be able to load a module is an
> administrator, basically this is a shortcut for:
> 
> * Apps/services can run with CAP_NET_ADMIN but they are not allowed to
> auto-load 'netdev' modules.
> 
> * Explicitly loading modules can be guarded by seccomp filters *per*
> app, so even if these apps have
>   CAP_SYS_MODULE they won't be able to explicitly load modules, one
> has to remount some sysctl /proc/ entries read-only here and remove
> CAP_SYS_ADMIN for all apps anyway.
> 
> This mainly serves the purpose of these systems that do not receive
> updates, if I don't want to expose those kernel interfaces what should
> I do ? then if I want to unload old versions and replace them with new
> ones what operation should be allowed ? and only real root of the
> system can do it. Hence, the "Once set, this value can not be changed"
> is more of a shortcut, also the idea was put in my mind based on how
> "modules_disabled" is disabled forever, and some other interfaces. I
> would say: it is easy to handle a transition from 1) "hey this system
> is still up to date, some features should be exposed" to 2) "this
> system is not up to date anymore, only root should expose some
> features..."
> 
> Hmm, I am not sure if this answers your question ? :-)

This answers my question, but in a way that I summarize as "there's no
good reason to include this securelevel-like property".

> I definitively don't want to fall into "modules_disabled" trap where
> is it too strict! "Once set, this value can not be changed" means for
> some users do not set it otherwise the system is unusable...
> 
> Maybe an extra "4" mode for that ? better get it right.

I think you should simply exclude this property from mode 2.

The module autoloading restrictions aren't meant to reduce root's
powers; they're only meant to protect processes from shooting themselves
and the system in the foot inadvertently (confused deputy).

modules_disabled may be different in that respect, although with the
rest of the kernel lacking securelevel-like support the point is moot.

We had working securelevel in 2.0.34 through 2.0.40 inclusive, but
we've lost it in 2.1+ with cap-bound apparently never becoming as
complete a replacement for it and having been lost/broken further in
2.6.25+.  I regret this, but that's a different story.  Like I say,
module autoloading doesn't even fit in with those restrictions - it's
about a totally different threat model.

Alexander


Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-22 Thread Solar Designer
On Mon, May 22, 2017 at 03:49:15PM +0200, Djalal Harouni wrote:
> On Mon, May 22, 2017 at 2:08 PM, Solar Designer  wrote:
> > On Mon, May 22, 2017 at 01:57:03PM +0200, Djalal Harouni wrote:
> >> *) When modules_autoload_mode is set to (2), automatic module loading is
> >> disabled for all. Once set, this value can not be changed.
> >
> > What purpose does this securelevel-like property ("Once set, this value
> > can not be changed.") serve here?  I think this mode 2 is needed, but
> > without this extra property, which is bypassable by e.g. explicitly
> > loaded kernel modules anyway (and that's OK).
> 
> My reasoning about "Once set, this value can not be changed" is mainly for:
> 
> If you have some systems where modules are not updated for any given
> reason, then the only one who will be able to load a module is an
> administrator, basically this is a shortcut for:
> 
> * Apps/services can run with CAP_NET_ADMIN but they are not allowed to
> auto-load 'netdev' modules.
> 
> * Explicitly loading modules can be guarded by seccomp filters *per*
> app, so even if these apps have
>   CAP_SYS_MODULE they won't be able to explicitly load modules, one
> has to remount some sysctl /proc/ entries read-only here and remove
> CAP_SYS_ADMIN for all apps anyway.
> 
> This mainly serves the purpose of these systems that do not receive
> updates, if I don't want to expose those kernel interfaces what should
> I do ? then if I want to unload old versions and replace them with new
> ones what operation should be allowed ? and only real root of the
> system can do it. Hence, the "Once set, this value can not be changed"
> is more of a shortcut, also the idea was put in my mind based on how
> "modules_disabled" is disabled forever, and some other interfaces. I
> would say: it is easy to handle a transition from 1) "hey this system
> is still up to date, some features should be exposed" to 2) "this
> system is not up to date anymore, only root should expose some
> features..."
> 
> Hmm, I am not sure if this answers your question ? :-)

This answers my question, but in a way that I summarize as "there's no
good reason to include this securelevel-like property".

> I definitively don't want to fall into "modules_disabled" trap where
> is it too strict! "Once set, this value can not be changed" means for
> some users do not set it otherwise the system is unusable...
> 
> Maybe an extra "4" mode for that ? better get it right.

I think you should simply exclude this property from mode 2.

The module autoloading restrictions aren't meant to reduce root's
powers; they're only meant to protect processes from shooting themselves
and the system in the foot inadvertently (confused deputy).

modules_disabled may be different in that respect, although with the
rest of the kernel lacking securelevel-like support the point is moot.

We had working securelevel in 2.0.34 through 2.0.40 inclusive, but
we've lost it in 2.1+ with cap-bound apparently never becoming as
complete a replacement for it and having been lost/broken further in
2.6.25+.  I regret this, but that's a different story.  Like I say,
module autoloading doesn't even fit in with those restrictions - it's
about a totally different threat model.

Alexander


Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-22 Thread Solar Designer
Hi Djalal,

Thank you for your work on this!

On Mon, May 22, 2017 at 01:57:03PM +0200, Djalal Harouni wrote:
> *) When modules_autoload_mode is set to (2), automatic module loading is
> disabled for all. Once set, this value can not be changed.

What purpose does this securelevel-like property ("Once set, this value
can not be changed.") serve here?  I think this mode 2 is needed, but
without this extra property, which is bypassable by e.g. explicitly
loaded kernel modules anyway (and that's OK).

I'm sorry if this has been discussed before.

Alexander


Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-22 Thread Solar Designer
Hi Djalal,

Thank you for your work on this!

On Mon, May 22, 2017 at 01:57:03PM +0200, Djalal Harouni wrote:
> *) When modules_autoload_mode is set to (2), automatic module loading is
> disabled for all. Once set, this value can not be changed.

What purpose does this securelevel-like property ("Once set, this value
can not be changed.") serve here?  I think this mode 2 is needed, but
without this extra property, which is bypassable by e.g. explicitly
loaded kernel modules anyway (and that's OK).

I'm sorry if this has been discussed before.

Alexander


Re: [oss-security] Linux kernel ping socket / AF_LLC connect() sin_family race

2017-03-24 Thread Solar Designer
On Fri, Mar 24, 2017 at 03:21:06PM -0700, Eric Dumazet wrote:
> Looks easy enough to fix ?

Oh.  Probably.  Thanks.  Need to test, but I guess you already did?

> diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> index
> 2af6244b83e27ae384e96cf071c10c5a89674804..ccfbce13a6333a65dab64e4847dd510dfafb1b43
> 100644
> --- a/net/ipv4/ping.c
> +++ b/net/ipv4/ping.c
> @@ -156,17 +156,18 @@ int ping_hash(struct sock *sk)
>  void ping_unhash(struct sock *sk)
>  {
> struct inet_sock *isk = inet_sk(sk);
> +
> pr_debug("ping_unhash(isk=%p,isk->num=%u)\n", isk, isk->inet_num);
> +   write_lock_bh(_table.lock);
> if (sk_hashed(sk)) {
> -   write_lock_bh(_table.lock);
> hlist_nulls_del(>sk_nulls_node);
> sk_nulls_node_init(>sk_nulls_node);
> sock_put(sk);
> isk->inet_num = 0;
> isk->inet_sport = 0;
> sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
> -   write_unlock_bh(_table.lock);
> }
> +   write_unlock_bh(_table.lock);
>  }
>  EXPORT_SYMBOL_GPL(ping_unhash);

FWIW, in Pavel's original implementation for 2.4.32 (unused), this was:

static void ping_v4_unhash(struct sock *sk)
{
DEBUG(("ping_v4_unhash(sk=%p,sk->num=%u)\n", sk, sk->num));
write_lock_bh(_hash_lock);
if (sk->pprev) {
if (sk->next)
   sk->next->pprev = sk->pprev;
*sk->pprev = sk->next;
sk->pprev = NULL;
sk->num = 0;
sock_prot_dec_use(sk->prot);
__sock_put(sk);
}
write_unlock_bh(_hash_lock);
}

Looks like the erroneous optimization (not expecting concurrent activity
on the same socket?) was introduced during conversion to 2.6's hlists.

So far this cursed function had 3 bugs, two of them security (including
this one) and one probably benign (or if not, then effectively a subset
of this bug as it performed some unneeded / stale debugging work before
acquiring the lock), with all 3 introduced in forward-porting.  Maybe
the nature of forward-porting activity makes people relatively
inattentive ("compiles with the new interfaces and still works? must be
correct"), compared to when writing new code.

Anyhow, I share some responsibility for this mess, for having advocated
this patch being forward-ported and merged back then.  I still like
having this functionality and its userspace security benefits... but I
don't like the kernel bugs.

Alexander


Re: [oss-security] Linux kernel ping socket / AF_LLC connect() sin_family race

2017-03-24 Thread Solar Designer
On Fri, Mar 24, 2017 at 03:21:06PM -0700, Eric Dumazet wrote:
> Looks easy enough to fix ?

Oh.  Probably.  Thanks.  Need to test, but I guess you already did?

> diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> index
> 2af6244b83e27ae384e96cf071c10c5a89674804..ccfbce13a6333a65dab64e4847dd510dfafb1b43
> 100644
> --- a/net/ipv4/ping.c
> +++ b/net/ipv4/ping.c
> @@ -156,17 +156,18 @@ int ping_hash(struct sock *sk)
>  void ping_unhash(struct sock *sk)
>  {
> struct inet_sock *isk = inet_sk(sk);
> +
> pr_debug("ping_unhash(isk=%p,isk->num=%u)\n", isk, isk->inet_num);
> +   write_lock_bh(_table.lock);
> if (sk_hashed(sk)) {
> -   write_lock_bh(_table.lock);
> hlist_nulls_del(>sk_nulls_node);
> sk_nulls_node_init(>sk_nulls_node);
> sock_put(sk);
> isk->inet_num = 0;
> isk->inet_sport = 0;
> sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
> -   write_unlock_bh(_table.lock);
> }
> +   write_unlock_bh(_table.lock);
>  }
>  EXPORT_SYMBOL_GPL(ping_unhash);

FWIW, in Pavel's original implementation for 2.4.32 (unused), this was:

static void ping_v4_unhash(struct sock *sk)
{
DEBUG(("ping_v4_unhash(sk=%p,sk->num=%u)\n", sk, sk->num));
write_lock_bh(_hash_lock);
if (sk->pprev) {
if (sk->next)
   sk->next->pprev = sk->pprev;
*sk->pprev = sk->next;
sk->pprev = NULL;
sk->num = 0;
sock_prot_dec_use(sk->prot);
__sock_put(sk);
}
write_unlock_bh(_hash_lock);
}

Looks like the erroneous optimization (not expecting concurrent activity
on the same socket?) was introduced during conversion to 2.6's hlists.

So far this cursed function had 3 bugs, two of them security (including
this one) and one probably benign (or if not, then effectively a subset
of this bug as it performed some unneeded / stale debugging work before
acquiring the lock), with all 3 introduced in forward-porting.  Maybe
the nature of forward-porting activity makes people relatively
inattentive ("compiles with the new interfaces and still works? must be
correct"), compared to when writing new code.

Anyhow, I share some responsibility for this mess, for having advocated
this patch being forward-ported and merged back then.  I still like
having this functionality and its userspace security benefits... but I
don't like the kernel bugs.

Alexander


non-response to IPI kills wrong(?) task, confuses logs

2016-09-20 Thread Solar Designer
Hi,

Adam Zabrocki, CC'ed here, informed me of the following:

There is an undesirable situation in SMP Linux machines when sending an
IPI via the smp_call_function_single() API:

/*
 * smp_call_function_single - Run a function on a specific CPU
 * @func: The function to run. This must be fast and non-blocking.
 * @info: An arbitrary pointer to pass to the function.
 * @wait: If true, wait until function has completed on other CPUs.
 *
 * Returns 0 on success, else a negative status code.
 */
int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 int wait)

It runs "func" on a specific CPU, and if the "wait" flag is true, it
waits until "func" has completed work before returning to the caller.
This is a useful feature used to control when it is safe to continue and
potentially parse data generated by another CPU.

Unfortunately, Linux can kill the task waiting on its
smp_call_function_single().  The NMI watchdog is detecting a "soft
lockup" on the CPU which calls smp_call_function_single() with the
"wait" flag set.  In practice it is not always the desirable behavior.
It is possible that the target CPU of the IPI was non-responsive and
would never (or not in time) pick up the IPI sent to it.  From the
perspective of the task sending the IPI, the target CPU never finished
the work, so smp_call_function_single() continues its busy wait loop.
In the meantime, the NMI watchdog can kill the busy-waiting task instead
of somehow "unlocking" the CPU that did not handle the IPI in time.
This is often visible in the logs like e.g.:

Jul 18 00:43:06 ubuntu kernel: [1578895.078397] NMI watchdog: BUG: soft lockup 
- CPU#1 stuck for 25s! [thermald:714]
...
Jul 18 00:43:06 ubuntu kernel: [1578895.078470] CPU: 1 PID: 714 Comm: thermald 
Tainted: GW  OEL  4.2.0-30-generic #36-Ubuntu
Jul 18 00:43:06 ubuntu kernel: [1578895.078472] Hardware name: VMware, Inc. 
VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
07/02/2015
Jul 18 00:43:06 ubuntu kernel: [1578895.078475] task: 88003776e040 ti: 
88003839 task.ti: 88003839
Jul 18 00:43:06 ubuntu kernel: [1578895.078477] RIP: 0010:[]  
[] smp_call_function_single+0xd8/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078486] RSP: 0018:880038393c98  
EFLAGS: 0202
Jul 18 00:43:06 ubuntu kernel: [1578895.078487] RAX:  RBX: 
 RCX: 
Jul 18 00:43:06 ubuntu kernel: [1578895.078489] RDX: 0001 RSI: 
0100 RDI: 0292
Jul 18 00:43:06 ubuntu kernel: [1578895.078490] RBP: 880038393ce8 R08: 
 R09: 0001
Jul 18 00:43:06 ubuntu kernel: [1578895.078492] R10: 8800175de000 R11: 
880035123180 R12: 880034c1c2d0
Jul 18 00:43:06 ubuntu kernel: [1578895.078493] R13: 0292 R14: 
0292 R15: 880038393c38
Jul 18 00:43:06 ubuntu kernel: [1578895.078495] FS:  7fa118895700() 
GS:88003d64() knlGS:
Jul 18 00:43:06 ubuntu kernel: [1578895.078497] CS:  0010 DS:  ES:  
CR0: 80050033
Jul 18 00:43:06 ubuntu kernel: [1578895.078499] CR2: 7fabda352000 CR3: 
34c22000 CR4: 001406e0
Jul 18 00:43:06 ubuntu kernel: [1578895.078508] Stack:
Jul 18 00:43:06 ubuntu kernel: [1578895.078510]   
8121e2c4 880038393cd8 
Jul 18 00:43:06 ubuntu kernel: [1578895.078513]  813f86b0 
880038393d00 0003 5a1685d7
Jul 18 00:43:06 ubuntu kernel: [1578895.078515]  880038393d4c 
880038393d48 880038393d38 813f884d
Jul 18 00:43:06 ubuntu kernel: [1578895.078518] Call Trace:
Jul 18 00:43:06 ubuntu kernel: [1578895.078526]  [] ? 
mntput+0x24/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078532]  [] ? 
ucs2_strncmp+0x50/0x50
Jul 18 00:43:06 ubuntu kernel: [1578895.078642]  [] 
rdmsr_on_cpu+0x5d/0x90
Jul 18 00:43:06 ubuntu kernel: [1578895.078650]  [] 
show_temp+0xa7/0xe0 [coretemp]
Jul 18 00:43:06 ubuntu kernel: [1578895.078656]  [] 
dev_attr_show+0x20/0x50
Jul 18 00:43:06 ubuntu kernel: [1578895.078662]  [] ? 
mutex_lock+0x16/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078667]  [] 
sysfs_kf_seq_show+0xbc/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078669]  [] 
kernfs_seq_show+0x23/0x30
Jul 18 00:43:06 ubuntu kernel: [1578895.078672]  [] 
seq_read+0xec/0x390
Jul 18 00:43:06 ubuntu kernel: [1578895.078675]  [] 
kernfs_fop_read+0x10a/0x160
Jul 18 00:43:06 ubuntu kernel: [1578895.078679]  [] ? 
security_file_permission+0xa0/0xc0
Jul 18 00:43:06 ubuntu kernel: [1578895.078682]  [] 
__vfs_read+0x18/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078685]  [] 
vfs_read+0x87/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078687]  [] 
SyS_read+0x55/0xc0
Jul 18 00:43:07 ubuntu kernel: [1578895.078690]  [] 
entry_SYSCALL_64_fastpath+0x16/0x75
Jul 18 00:43:07 ubuntu kernel: [1578895.078692] Code: 25 28 00 00 00 75 70 48 
83 c4 40 5b 41 5c 5d c3 48 8d 75 c8 

non-response to IPI kills wrong(?) task, confuses logs

2016-09-20 Thread Solar Designer
Hi,

Adam Zabrocki, CC'ed here, informed me of the following:

There is an undesirable situation in SMP Linux machines when sending an
IPI via the smp_call_function_single() API:

/*
 * smp_call_function_single - Run a function on a specific CPU
 * @func: The function to run. This must be fast and non-blocking.
 * @info: An arbitrary pointer to pass to the function.
 * @wait: If true, wait until function has completed on other CPUs.
 *
 * Returns 0 on success, else a negative status code.
 */
int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 int wait)

It runs "func" on a specific CPU, and if the "wait" flag is true, it
waits until "func" has completed work before returning to the caller.
This is a useful feature used to control when it is safe to continue and
potentially parse data generated by another CPU.

Unfortunately, Linux can kill the task waiting on its
smp_call_function_single().  The NMI watchdog is detecting a "soft
lockup" on the CPU which calls smp_call_function_single() with the
"wait" flag set.  In practice it is not always the desirable behavior.
It is possible that the target CPU of the IPI was non-responsive and
would never (or not in time) pick up the IPI sent to it.  From the
perspective of the task sending the IPI, the target CPU never finished
the work, so smp_call_function_single() continues its busy wait loop.
In the meantime, the NMI watchdog can kill the busy-waiting task instead
of somehow "unlocking" the CPU that did not handle the IPI in time.
This is often visible in the logs like e.g.:

Jul 18 00:43:06 ubuntu kernel: [1578895.078397] NMI watchdog: BUG: soft lockup 
- CPU#1 stuck for 25s! [thermald:714]
...
Jul 18 00:43:06 ubuntu kernel: [1578895.078470] CPU: 1 PID: 714 Comm: thermald 
Tainted: GW  OEL  4.2.0-30-generic #36-Ubuntu
Jul 18 00:43:06 ubuntu kernel: [1578895.078472] Hardware name: VMware, Inc. 
VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
07/02/2015
Jul 18 00:43:06 ubuntu kernel: [1578895.078475] task: 88003776e040 ti: 
88003839 task.ti: 88003839
Jul 18 00:43:06 ubuntu kernel: [1578895.078477] RIP: 0010:[]  
[] smp_call_function_single+0xd8/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078486] RSP: 0018:880038393c98  
EFLAGS: 0202
Jul 18 00:43:06 ubuntu kernel: [1578895.078487] RAX:  RBX: 
 RCX: 
Jul 18 00:43:06 ubuntu kernel: [1578895.078489] RDX: 0001 RSI: 
0100 RDI: 0292
Jul 18 00:43:06 ubuntu kernel: [1578895.078490] RBP: 880038393ce8 R08: 
 R09: 0001
Jul 18 00:43:06 ubuntu kernel: [1578895.078492] R10: 8800175de000 R11: 
880035123180 R12: 880034c1c2d0
Jul 18 00:43:06 ubuntu kernel: [1578895.078493] R13: 0292 R14: 
0292 R15: 880038393c38
Jul 18 00:43:06 ubuntu kernel: [1578895.078495] FS:  7fa118895700() 
GS:88003d64() knlGS:
Jul 18 00:43:06 ubuntu kernel: [1578895.078497] CS:  0010 DS:  ES:  
CR0: 80050033
Jul 18 00:43:06 ubuntu kernel: [1578895.078499] CR2: 7fabda352000 CR3: 
34c22000 CR4: 001406e0
Jul 18 00:43:06 ubuntu kernel: [1578895.078508] Stack:
Jul 18 00:43:06 ubuntu kernel: [1578895.078510]   
8121e2c4 880038393cd8 
Jul 18 00:43:06 ubuntu kernel: [1578895.078513]  813f86b0 
880038393d00 0003 5a1685d7
Jul 18 00:43:06 ubuntu kernel: [1578895.078515]  880038393d4c 
880038393d48 880038393d38 813f884d
Jul 18 00:43:06 ubuntu kernel: [1578895.078518] Call Trace:
Jul 18 00:43:06 ubuntu kernel: [1578895.078526]  [] ? 
mntput+0x24/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078532]  [] ? 
ucs2_strncmp+0x50/0x50
Jul 18 00:43:06 ubuntu kernel: [1578895.078642]  [] 
rdmsr_on_cpu+0x5d/0x90
Jul 18 00:43:06 ubuntu kernel: [1578895.078650]  [] 
show_temp+0xa7/0xe0 [coretemp]
Jul 18 00:43:06 ubuntu kernel: [1578895.078656]  [] 
dev_attr_show+0x20/0x50
Jul 18 00:43:06 ubuntu kernel: [1578895.078662]  [] ? 
mutex_lock+0x16/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078667]  [] 
sysfs_kf_seq_show+0xbc/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078669]  [] 
kernfs_seq_show+0x23/0x30
Jul 18 00:43:06 ubuntu kernel: [1578895.078672]  [] 
seq_read+0xec/0x390
Jul 18 00:43:06 ubuntu kernel: [1578895.078675]  [] 
kernfs_fop_read+0x10a/0x160
Jul 18 00:43:06 ubuntu kernel: [1578895.078679]  [] ? 
security_file_permission+0xa0/0xc0
Jul 18 00:43:06 ubuntu kernel: [1578895.078682]  [] 
__vfs_read+0x18/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078685]  [] 
vfs_read+0x87/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078687]  [] 
SyS_read+0x55/0xc0
Jul 18 00:43:07 ubuntu kernel: [1578895.078690]  [] 
entry_SYSCALL_64_fastpath+0x16/0x75
Jul 18 00:43:07 ubuntu kernel: [1578895.078692] Code: 25 28 00 00 00 75 70 48 
83 c4 40 5b 41 5c 5d c3 48 8d 75 c8 

Re: [patch] lib: check for strcpy() overflows to fixed length buffers

2014-04-30 Thread Solar Designer
On Wed, Apr 30, 2014 at 06:08:44PM +0300, Dan Carpenter wrote:
> There are sometimes where we know that we are doing an strcpy() into a
> fixed length buffer.  In those cases, we could verify that the strcpy()
> doesn't overflow.  This patch introduces DEBUG_STRICT_SLOW_STRCPY_CHECKS
> if you want to check for that.

FWIW, I had posted similar macros for userland strcpy() and friends to
the security-audit list (now defunct) in 1998.  Someone preserved a copy
here (although the indentation is lost):

http://www.opennet.ru/soft/0432.html

In (weird) use, with proper indentation:

http://www.merit.edu/mail.archives/nanog/2000-02/msg00366.html
https://github.com/tureba/trinoo/blob/master/strfix.h

Personally, I was using this at the time for building known-broken
software like wu-ftpd, where the risk of false positives felt lower than
the risk of buffer overflow bugs being in fact present in the code.

I used gcc's Statement Exprs extension, which is also used in the Linux
kernel a lot:

http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

So maybe you should, too.  (That is, if you want to go ahead with this
approach for code that isn't meant to be as broken as wu-ftpd was.)
This lets us propagate the original return value.

To determine the destination buffer size, I simply used sizeof() and
skipped my added protection in case the size looked like that of a
pointer.  Now you have those nice new gcc features instead. :-)

> The downside is that it makes strcpy slower.

I guess the slowdown is mostly from the added strlen().  I avoided it by
using strncat(), so I had truncation instead of detection.  It is
unclear which is better.

Other functions I did this for are strcat(), sprintf(), vsprintf().

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] lib: check for strcpy() overflows to fixed length buffers

2014-04-30 Thread Solar Designer
On Wed, Apr 30, 2014 at 06:08:44PM +0300, Dan Carpenter wrote:
 There are sometimes where we know that we are doing an strcpy() into a
 fixed length buffer.  In those cases, we could verify that the strcpy()
 doesn't overflow.  This patch introduces DEBUG_STRICT_SLOW_STRCPY_CHECKS
 if you want to check for that.

FWIW, I had posted similar macros for userland strcpy() and friends to
the security-audit list (now defunct) in 1998.  Someone preserved a copy
here (although the indentation is lost):

http://www.opennet.ru/soft/0432.html

In (weird) use, with proper indentation:

http://www.merit.edu/mail.archives/nanog/2000-02/msg00366.html
https://github.com/tureba/trinoo/blob/master/strfix.h

Personally, I was using this at the time for building known-broken
software like wu-ftpd, where the risk of false positives felt lower than
the risk of buffer overflow bugs being in fact present in the code.

I used gcc's Statement Exprs extension, which is also used in the Linux
kernel a lot:

http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

So maybe you should, too.  (That is, if you want to go ahead with this
approach for code that isn't meant to be as broken as wu-ftpd was.)
This lets us propagate the original return value.

To determine the destination buffer size, I simply used sizeof() and
skipped my added protection in case the size looked like that of a
pointer.  Now you have those nice new gcc features instead. :-)

 The downside is that it makes strcpy slower.

I guess the slowdown is mostly from the added strlen().  I avoided it by
using strncat(), so I had truncation instead of detection.  It is
unclear which is better.

Other functions I did this for are strcat(), sprintf(), vsprintf().

Alexander
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/