Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-23 Thread Catalin Marinas
On Fri, Oct 23, 2020 at 07:13:17AM +0100, Szabolcs Nagy wrote:
> The 10/22/2020 10:31, Catalin Marinas wrote:
> > IIUC, the problem is with the main executable which is mapped by the
> > kernel without PROT_BTI. The dynamic loader wants to set PROT_BTI but
> > does not have the original file descriptor to be able to remap. Its only
> > choice is mprotect() and this fails because of the MDWX policy.
> > 
> > Not sure whether the kernel has the right information but could it map
> > the main executable with PROT_BTI if the corresponding PT_GNU_PROPERTY
> > is found? The current ABI states it only sets PROT_BTI for the
> > interpreter who'd be responsible for setting the PROT_BTI on the main
> > executable. I can't tell whether it would break anything but it's worth
> > a try:
> 
> i think it would work, but now i can't easily
> tell from the libc if i have to do the mprotect
> on the main exe or not.
> 
> i guess i can just always mprotect and ignore
> the failure?

I replied to Keys before reading your email. So yeah, still issue
mprotect() but ignore the failure.

-- 
Catalin


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-23 Thread Szabolcs Nagy
The 10/22/2020 10:31, Catalin Marinas wrote:
> IIUC, the problem is with the main executable which is mapped by the
> kernel without PROT_BTI. The dynamic loader wants to set PROT_BTI but
> does not have the original file descriptor to be able to remap. Its only
> choice is mprotect() and this fails because of the MDWX policy.
> 
> Not sure whether the kernel has the right information but could it map
> the main executable with PROT_BTI if the corresponding PT_GNU_PROPERTY
> is found? The current ABI states it only sets PROT_BTI for the
> interpreter who'd be responsible for setting the PROT_BTI on the main
> executable. I can't tell whether it would break anything but it's worth
> a try:

i think it would work, but now i can't easily
tell from the libc if i have to do the mprotect
on the main exe or not.

i guess i can just always mprotect and ignore
the failure?

> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 4784011cecac..0a08fb9133e8 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -730,14 +730,6 @@ asmlinkage void __sched arm64_preempt_schedule_irq(void)
>  int arch_elf_adjust_prot(int prot, const struct arch_elf_state *state,
>bool has_interp, bool is_interp)
>  {
> - /*
> -  * For dynamically linked executables the interpreter is
> -  * responsible for setting PROT_BTI on everything except
> -  * itself.
> -  */
> - if (is_interp != has_interp)
> - return prot;
> -
>   if (!(state->flags & ARM64_ELF_BTI))
>   return prot;
>  
> 
> -- 
> Catalin

-- 


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Topi Miettinen:

> Allowing mprotect(PROT_EXEC|PROT_BTI) would mean that all you need to
> circumvent MDWX is to add PROT_BTI flag. I'd suggest getting the flags 
> right at mmap() time or failing that, reverting the PROT_BTI for
> legacy programs later.
>
> Could the kernel tell the loader of the BTI situation with auxiliary
> vectors? Then it would be easy for the loader to always use the best 
> mmap() flags without ever needing to mprotect().

I think what we want is a mprotect2 call with a flags argument (separate
from protection flags) that tells the kernel that the request *removes*
protection flags and should fail otherwise.  seccomp could easily filter
that then.

But like the other proposals, the migration story isn't great.  You
would need kernel and seccomp/systemd etc. updates before glibc starts
working, even if glibc has a fallback from mprotect2 to mprotect
(because the latter would be blocked).

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Topi Miettinen

On 22.10.2020 12.31, Catalin Marinas wrote:

On Thu, Oct 22, 2020 at 10:38:23AM +0200, Lennart Poettering wrote:

On Do, 22.10.20 09:29, Szabolcs Nagy (szabolcs.n...@arm.com) wrote:

The dynamic loader has to process the LOAD segments to get to the ELF
note that says to enable BTI.  Maybe we could do a first pass and load
only the segments that cover notes.  But that requires lots of changes
to generic code in the loader.


What if the loader always enabled BTI for PROT_EXEC pages, but then when
discovering that this was a mistake, mprotect() the pages without BTI? Then
both BTI and MDWX would work and the penalty of not getting MDWX would fall
to non-BTI programs. What's the expected proportion of BTI enabled code vs.
disabled in the future, is it perhaps expected that a distro would enable
the flag globally so eventually only a few legacy programs might be
unprotected?


i thought mprotect(PROT_EXEC) would get filtered
with or without bti, is that not the case?


We can adjust the filter in systemd to match any combination of
flags to allow and to deny.


Yes but Szabolcs' point to Topi was that if we can adjust the filters to
allow mprotect(PROT_EXEC), why not allow mprotect(PROT_EXEC|PROT_BTI)
instead? Anyway, I see the MDWX and BTI as complementary policies so
ideally we shouldn't have to choose between one or the other. If we
allow mprotect(PROT_EXEC), that would override MDWX and also disable
BTI.


Allowing mprotect(PROT_EXEC|PROT_BTI) would mean that all you need to 
circumvent MDWX is to add PROT_BTI flag. I'd suggest getting the flags 
right at mmap() time or failing that, reverting the PROT_BTI for legacy 
programs later.


Could the kernel tell the loader of the BTI situation with auxiliary 
vectors? Then it would be easy for the loader to always use the best 
mmap() flags without ever needing to mprotect().


-Topi


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Topi Miettinen

On 22.10.2020 11.29, Szabolcs Nagy wrote:

The 10/22/2020 11:17, Topi Miettinen via Libc-alpha wrote:

On 22.10.2020 10.54, Florian Weimer wrote:

* Lennart Poettering:

Did you see Topi's comments on the systemd issue?

https://github.com/systemd/systemd/issues/17368#issuecomment-710485532

I think I agree with this: it's a bit weird to alter the bits after
the fact. Can't glibc set up everything right from the begining? That
would keep both concepts working.


The dynamic loader has to process the LOAD segments to get to the ELF
note that says to enable BTI.  Maybe we could do a first pass and load
only the segments that cover notes.  But that requires lots of changes
to generic code in the loader.


What if the loader always enabled BTI for PROT_EXEC pages, but then when
discovering that this was a mistake, mprotect() the pages without BTI? Then
both BTI and MDWX would work and the penalty of not getting MDWX would fall
to non-BTI programs. What's the expected proportion of BTI enabled code vs.
disabled in the future, is it perhaps expected that a distro would enable
the flag globally so eventually only a few legacy programs might be
unprotected?


i thought mprotect(PROT_EXEC) would get filtered
with or without bti, is that not the case?


It would be filtered, but the idea is that with modern binaries this 
would not happen since the pages would be mapped with mmap(,, PROT_EXEC 
| PROT_BTI,,) which is OK for purposes MDWX. The loader would have to 
use mprotect(PROT_EXEC) to get rid of PROT_BTI only for the legacy binaries.


-Topi


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Catalin Marinas
On Thu, Oct 22, 2020 at 10:38:23AM +0200, Lennart Poettering wrote:
> On Do, 22.10.20 09:29, Szabolcs Nagy (szabolcs.n...@arm.com) wrote:
> > > > The dynamic loader has to process the LOAD segments to get to the ELF
> > > > note that says to enable BTI.  Maybe we could do a first pass and load
> > > > only the segments that cover notes.  But that requires lots of changes
> > > > to generic code in the loader.
> > >
> > > What if the loader always enabled BTI for PROT_EXEC pages, but then when
> > > discovering that this was a mistake, mprotect() the pages without BTI? 
> > > Then
> > > both BTI and MDWX would work and the penalty of not getting MDWX would 
> > > fall
> > > to non-BTI programs. What's the expected proportion of BTI enabled code 
> > > vs.
> > > disabled in the future, is it perhaps expected that a distro would enable
> > > the flag globally so eventually only a few legacy programs might be
> > > unprotected?
> >
> > i thought mprotect(PROT_EXEC) would get filtered
> > with or without bti, is that not the case?
> 
> We can adjust the filter in systemd to match any combination of
> flags to allow and to deny.

Yes but Szabolcs' point to Topi was that if we can adjust the filters to
allow mprotect(PROT_EXEC), why not allow mprotect(PROT_EXEC|PROT_BTI)
instead? Anyway, I see the MDWX and BTI as complementary policies so
ideally we shouldn't have to choose between one or the other. If we
allow mprotect(PROT_EXEC), that would override MDWX and also disable
BTI.

IIUC, the problem is with the main executable which is mapped by the
kernel without PROT_BTI. The dynamic loader wants to set PROT_BTI but
does not have the original file descriptor to be able to remap. Its only
choice is mprotect() and this fails because of the MDWX policy.

Not sure whether the kernel has the right information but could it map
the main executable with PROT_BTI if the corresponding PT_GNU_PROPERTY
is found? The current ABI states it only sets PROT_BTI for the
interpreter who'd be responsible for setting the PROT_BTI on the main
executable. I can't tell whether it would break anything but it's worth
a try:

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4784011cecac..0a08fb9133e8 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -730,14 +730,6 @@ asmlinkage void __sched arm64_preempt_schedule_irq(void)
 int arch_elf_adjust_prot(int prot, const struct arch_elf_state *state,
 bool has_interp, bool is_interp)
 {
-   /*
-* For dynamically linked executables the interpreter is
-* responsible for setting PROT_BTI on everything except
-* itself.
-*/
-   if (is_interp != has_interp)
-   return prot;
-
if (!(state->flags & ARM64_ELF_BTI))
return prot;
 

-- 
Catalin


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Lennart Poettering
On Do, 22.10.20 09:29, Szabolcs Nagy (szabolcs.n...@arm.com) wrote:

> > > The dynamic loader has to process the LOAD segments to get to the ELF
> > > note that says to enable BTI.  Maybe we could do a first pass and load
> > > only the segments that cover notes.  But that requires lots of changes
> > > to generic code in the loader.
> >
> > What if the loader always enabled BTI for PROT_EXEC pages, but then when
> > discovering that this was a mistake, mprotect() the pages without BTI? Then
> > both BTI and MDWX would work and the penalty of not getting MDWX would fall
> > to non-BTI programs. What's the expected proportion of BTI enabled code vs.
> > disabled in the future, is it perhaps expected that a distro would enable
> > the flag globally so eventually only a few legacy programs might be
> > unprotected?
>
> i thought mprotect(PROT_EXEC) would get filtered
> with or without bti, is that not the case?

We can adjust the filter in systemd to match any combination of
flags to allow and to deny.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Lennart Poettering
On Do, 22.10.20 09:05, Szabolcs Nagy (szabolcs.n...@arm.com) wrote:

> > > Various changes have been suggested, replacing the mprotect with mmap 
> > > calls
> > > having PROT_BTI set on the original mapping, re-mmapping the segments,
> > > implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
> > > and various modification to seccomp to allow particular mprotect cases to
> > > bypass the filters. In each case there seems to be an undesirable 
> > > attribute
> > > to the solution.
> > >
> > > So, whats the best solution?
> >
> > Did you see Topi's comments on the systemd issue?
> >
> > https://github.com/systemd/systemd/issues/17368#issuecomment-710485532
> >
> > I think I agree with this: it's a bit weird to alter the bits after
> > the fact. Can't glibc set up everything right from the begining? That
> > would keep both concepts working.
>
> that's hard to do and does not work for the main exe currently
> (which is mmaped by the kernel).
>
> (it's hard to do because to know that the elf module requires
> bti the PT_GNU_PROPERTY notes have to be accessed that are
> often in the executable load segment, so either you mmap that
> or have to read that, but the latter has a lot more failure
> modes, so if i have to get the mmap flags right i'd do a mmap
> and then re-mmap if the flags were not right)

Only other option I then see is to neuter one of the two
mechanisms. We could certainly turn off MDWE on arm in systemd, if
people want that. Or make it a build-time choice, so that distros make
the choice: build everything with BTI xor suppport MDWE.

(Might make sense for glibc to gracefully fallback to non-BTI mode if
the mprotect() fails though, to make sure BTI-built binaries work
everywhere.)

I figure your interest in ARM system security is bigger than mine. I
am totally fine to turn off MDWE on ARM if that's what the Linux ARM
folks want. I ave no horse in the race. Just let me know.

[An acceptable compromise might be to allow
mprotect(PROT_EXEC|PROT_BTI) if MDWE is on, but prohibit
mprotect(PROT_EXEC) without PROT_BTI. Then at least you get one of the
two protections, but not both. I mean, MDWE is not perfect anyway on
non-x86-64 already: on 32bit i386 MDWE protection is not complete, due
to ipc() syscall multiplexing being unmatchable with seccomp. I
personally am happy as long as it works fully on x86-64]

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Szabolcs Nagy
The 10/22/2020 11:17, Topi Miettinen via Libc-alpha wrote:
> On 22.10.2020 10.54, Florian Weimer wrote:
> > * Lennart Poettering:
> > > Did you see Topi's comments on the systemd issue?
> > > 
> > > https://github.com/systemd/systemd/issues/17368#issuecomment-710485532
> > > 
> > > I think I agree with this: it's a bit weird to alter the bits after
> > > the fact. Can't glibc set up everything right from the begining? That
> > > would keep both concepts working.
> > 
> > The dynamic loader has to process the LOAD segments to get to the ELF
> > note that says to enable BTI.  Maybe we could do a first pass and load
> > only the segments that cover notes.  But that requires lots of changes
> > to generic code in the loader.
> 
> What if the loader always enabled BTI for PROT_EXEC pages, but then when
> discovering that this was a mistake, mprotect() the pages without BTI? Then
> both BTI and MDWX would work and the penalty of not getting MDWX would fall
> to non-BTI programs. What's the expected proportion of BTI enabled code vs.
> disabled in the future, is it perhaps expected that a distro would enable
> the flag globally so eventually only a few legacy programs might be
> unprotected?

i thought mprotect(PROT_EXEC) would get filtered
with or without bti, is that not the case?

then i guess we can do the protection that way
around, but then i don't see why the filter cannot
treat PROT_EXEC|PROT_BTI the same as PROT_EXEC.



Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Topi Miettinen:

>> The dynamic loader has to process the LOAD segments to get to the ELF
>> note that says to enable BTI.  Maybe we could do a first pass and
>> load only the segments that cover notes.  But that requires lots of
>> changes to generic code in the loader.
>
> What if the loader always enabled BTI for PROT_EXEC pages, but then
> when discovering that this was a mistake, mprotect() the pages without
> BTI?

Is that architecturally supported?  How costly is the mprotect change if
the pages have not been faulted in yet?

> Then both BTI and MDWX would work and the penalty of not getting
> MDWX would fall to non-BTI programs. What's the expected proportion of
> BTI enabled code vs. disabled in the future, is it perhaps expected
> that a distro would enable the flag globally so eventually only a few
> legacy programs might be unprotected?

Eventually, I expect that mainstream distributions build everything for
BTI, so yes, the PROT_BTI removal would only be needed for legacy
programs.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Topi Miettinen

On 22.10.2020 10.54, Florian Weimer wrote:

* Lennart Poettering:


On Mi, 21.10.20 22:44, Jeremy Linton (jeremy.lin...@arm.com) wrote:


Hi,

There is a problem with glibc+systemd on BTI enabled systems. Systemd
has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
caught by the seccomp filter, resulting in service failures.

So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
This is obviously not desirable.

Various changes have been suggested, replacing the mprotect with mmap calls
having PROT_BTI set on the original mapping, re-mmapping the segments,
implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
and various modification to seccomp to allow particular mprotect cases to
bypass the filters. In each case there seems to be an undesirable attribute
to the solution.

So, whats the best solution?


Did you see Topi's comments on the systemd issue?

https://github.com/systemd/systemd/issues/17368#issuecomment-710485532

I think I agree with this: it's a bit weird to alter the bits after
the fact. Can't glibc set up everything right from the begining? That
would keep both concepts working.


The dynamic loader has to process the LOAD segments to get to the ELF
note that says to enable BTI.  Maybe we could do a first pass and load
only the segments that cover notes.  But that requires lots of changes
to generic code in the loader.


What if the loader always enabled BTI for PROT_EXEC pages, but then when 
discovering that this was a mistake, mprotect() the pages without BTI? 
Then both BTI and MDWX would work and the penalty of not getting MDWX 
would fall to non-BTI programs. What's the expected proportion of BTI 
enabled code vs. disabled in the future, is it perhaps expected that a 
distro would enable the flag globally so eventually only a few legacy 
programs might be unprotected?


-Topi


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Szabolcs Nagy
The 10/22/2020 09:18, Lennart Poettering wrote:
> On Mi, 21.10.20 22:44, Jeremy Linton (jeremy.lin...@arm.com) wrote:
> 
> > Hi,
> >
> > There is a problem with glibc+systemd on BTI enabled systems. Systemd
> > has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
> > PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
> > being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
> > caught by the seccomp filter, resulting in service failures.
> >
> > So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
> > This is obviously not desirable.
> >
> > Various changes have been suggested, replacing the mprotect with mmap calls
> > having PROT_BTI set on the original mapping, re-mmapping the segments,
> > implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
> > and various modification to seccomp to allow particular mprotect cases to
> > bypass the filters. In each case there seems to be an undesirable attribute
> > to the solution.
> >
> > So, whats the best solution?
> 
> Did you see Topi's comments on the systemd issue?
> 
> https://github.com/systemd/systemd/issues/17368#issuecomment-710485532
> 
> I think I agree with this: it's a bit weird to alter the bits after
> the fact. Can't glibc set up everything right from the begining? That
> would keep both concepts working.

that's hard to do and does not work for the main exe currently
(which is mmaped by the kernel).

(it's hard to do because to know that the elf module requires
bti the PT_GNU_PROPERTY notes have to be accessed that are
often in the executable load segment, so either you mmap that
or have to read that, but the latter has a lot more failure
modes, so if i have to get the mmap flags right i'd do a mmap
and then re-mmap if the flags were not right)


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Lennart Poettering:

> On Mi, 21.10.20 22:44, Jeremy Linton (jeremy.lin...@arm.com) wrote:
>
>> Hi,
>>
>> There is a problem with glibc+systemd on BTI enabled systems. Systemd
>> has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
>> PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
>> being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
>> caught by the seccomp filter, resulting in service failures.
>>
>> So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
>> This is obviously not desirable.
>>
>> Various changes have been suggested, replacing the mprotect with mmap calls
>> having PROT_BTI set on the original mapping, re-mmapping the segments,
>> implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
>> and various modification to seccomp to allow particular mprotect cases to
>> bypass the filters. In each case there seems to be an undesirable attribute
>> to the solution.
>>
>> So, whats the best solution?
>
> Did you see Topi's comments on the systemd issue?
>
> https://github.com/systemd/systemd/issues/17368#issuecomment-710485532
>
> I think I agree with this: it's a bit weird to alter the bits after
> the fact. Can't glibc set up everything right from the begining? That
> would keep both concepts working.

The dynamic loader has to process the LOAD segments to get to the ELF
note that says to enable BTI.  Maybe we could do a first pass and load
only the segments that cover notes.  But that requires lots of changes
to generic code in the loader.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Lennart Poettering
On Mi, 21.10.20 22:44, Jeremy Linton (jeremy.lin...@arm.com) wrote:

> Hi,
>
> There is a problem with glibc+systemd on BTI enabled systems. Systemd
> has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
> PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
> being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
> caught by the seccomp filter, resulting in service failures.
>
> So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
> This is obviously not desirable.
>
> Various changes have been suggested, replacing the mprotect with mmap calls
> having PROT_BTI set on the original mapping, re-mmapping the segments,
> implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
> and various modification to seccomp to allow particular mprotect cases to
> bypass the filters. In each case there seems to be an undesirable attribute
> to the solution.
>
> So, whats the best solution?

Did you see Topi's comments on the systemd issue?

https://github.com/systemd/systemd/issues/17368#issuecomment-710485532

I think I agree with this: it's a bit weird to alter the bits after
the fact. Can't glibc set up everything right from the begining? That
would keep both concepts working.

Lennart

--
Lennart Poettering, Berlin