Re: [systemd-devel] Getting delegated IPv6 prefix from systemd-networkd

2022-10-23 Thread Topi Miettinen

On 23.10.2022 13.57, Arseny Maslennikov wrote:

On Sun, Oct 23, 2022 at 09:30:16AM +0300, Topi Miettinen wrote:

On 22.10.2022 15.55, Ian Pilcher wrote:

On 10/19/22 17:05, Ian Pilcher wrote:

(I know that systemd-networkd can allocate subnets from the delegated
prefix to local interfaces, but that's not what I need to do in this
case.)

Is there an API of some sort that my scripts can use?  A file that I can
monitor/parse?  A command that I can run?



Anyone?



For firewall integration, perhaps PR #24570 (once merged) could be useful
for you. Then the assigned prefix can be used in NFTables rules with NFT
sets.


Topi, thank you for the effort, this looks great at first glance! What's
especially cool about this solution is it seems to relieve networkd from
the burden of maintaining a special custom API, reusing a Linux feature
instead.



myif.network:
[DHCPPrefixDelegation]
NFTSet=inet:myfilter:myif_ipv6_address

nftables.conf:
table inet myfilter {
 set myif_ipv6_address {
 type ipv6_addr
 flags interval
 }

 chain myservice_input {
 ip saddr @myif_ipv6_address accept
 drop
 }
}

This is better than using networkd-dispatcher or NetworkManager scripts,
since the sets will be updated immediately by networkd when the addresses
are available, but of course it can't cover all possible cases where scripts
could be used.


On the other hand, scripts (or daemons running indefinitely) can query
the data stored in the nft sets and monitor modifications to those sets.
That way, the set objects can be used as a generic communication medium,
not necessarily being part of firewall state. (This might even work for
people who prefer some alternatives to nftables: they won't have to use
the rule engine. Those ppl can argue the same way about ipset, though)

The following example shows how to monitor nft objects with a shell and
nft(1) CLI:

   [root@host ~]# nft -j monitor elements | grep --color -F 'baobab' &
   [1] 1547610
   [root@host ~]# nft --json list set inet default baobab | jq '.nftables[1:]'
   [
 {
   "set": {
 "family": "inet",
 "name": "baobab",
 "table": "default",
 "type": "ipv6_addr",
 "handle": 11,
 "flags": [
   "interval"
 ],
 "elem": [
   {
 "prefix": {
   "addr": "2001:db8:8::",
   "len": 48
 }
   },
   {
 "prefix": {
   "addr": "2001:db8:9::",
   "len": 48
 }
   }
 ]
   }
 }
   ]
   [root@host ~]#
   [root@host ~]# nft add element inet default baobab '{' 2001:db8:a::/48 '}' 
>/dev/null 2>&1
   {"add": {"element": {"family": "inet", "table": "default", "name": "baobab", "elem": {"set": [{"prefix": 
{"addr": "2001:db8:a::", "len": 48}}]
   [root@host ~]# nft flush set inet default baobab >/dev/null 2>&1
   {"delete": {"element": {"family": "inet", "table": "default", "name": "baobab", "elem": {"set": 
[{"prefix": {"addr": "2001:db8:a::", "len": 48}}]
   {"delete": {"element": {"family": "inet", "table": "default", "name": "baobab", "elem": {"set": 
[{"prefix": {"addr": "2001:db8:9::", "len": 48}}]
   {"delete": {"element": {"family": "inet", "table": "default", "name": "baobab", "elem": {"set": 
[{"prefix": {"addr": "2001:db8:8::", "len": 48}}]
   [root@host ~]# nft add element inet default baobab '{' 2001:db8:9::/48 '}' 
>/dev/null 2>&1
   {"add": {"element": {"family": "inet", "table": "default", "name": "baobab", "elem": {"set": [{"prefix": 
{"addr": "2001:db8:9::", "len": 48}}]
   [root@host ~]# nft add element inet default baobab '{' 2001:db8:8::/48 '}' 
>/dev/null 2>&1
   {"add": {"element": {"family": "inet", "table": "default", "name": "baobab", "elem": {"set": [{"prefix": 
{"addr": "2001:db8:

Re: [systemd-devel] Getting delegated IPv6 prefix from systemd-networkd

2022-10-23 Thread Topi Miettinen

On 22.10.2022 15.55, Ian Pilcher wrote:

On 10/19/22 17:05, Ian Pilcher wrote:

I've managed to get systemd-networkd to request and IPv6 prefix from my
ISP, and I can see it the journal.  Now I need to programmatically
retrieve it, so that my home-grown scripts can set up the various
routes, address, firewall rules, etc.

(I know that systemd-networkd can allocate subnets from the delegated
prefix to local interfaces, but that's not what I need to do in this
case.)

Is there an API of some sort that my scripts can use?  A file that I can
monitor/parse?  A command that I can run?

My searches thus far are coming up disturbingly empty ...



Anyone?



For firewall integration, perhaps PR #24570 (once merged) could be 
useful for you. Then the assigned prefix can be used in NFTables rules 
with NFT sets.


This is better than using networkd-dispatcher or NetworkManager scripts, 
since the sets will be updated immediately by networkd when the 
addresses are available, but of course it can't cover all possible cases 
where scripts could be used.


myif.network:
[DHCPPrefixDelegation]
NFTSet=inet:myfilter:myif_ipv6_address

nftables.conf:
table inet myfilter {
set myif_ipv6_address {
type ipv6_addr
flags interval
}

chain myservice_input {
ip saddr @myif_ipv6_address accept
drop
}
}

-Topi



Re: [systemd-devel] Getting rid of the /run/credentials mount

2022-08-25 Thread Topi Miettinen

On 25.8.2022 22.42, Marc Haber wrote:

Hi,

the aide (https://github.com/aide/aide) tool builds checksums of files
on the system and sends an alert if things change on the system. In the
Debian package, this is done from cron. I would like to move that to a
systemd timer and in passing use some of systemd's security features.
Here is my service:

[Unit]
Description=dailyaide check
StartLimitIntervalSec=7200
StartLimitBurst=1

[Service]
Type=oneshot
User=root
Group=root
Environment="CREDENTIALS_DIRECTORY=/nonexistent"
ProtectSystem=strict
ProtectClock=yes
ProtectKernelModules=no
ProtectKernelLogs=yes
ProtectControlGroups=yes
PrivateDevices=no
ProtectKernelTunables=yes
ProtectControlGroups=yes
ProtectHome=read-only
ReadWritePaths=/run/aide /var/lib/aide /var/log/aide /var/spool/exim4 
/var/log/exim4 /var/tmp /tmp
RestrictRealtime=yes
RestrictSUIDSGID=yes
PrivateTmp=no
ExecStartPre-=/bin/umount /run/credentials
ExecStart=/usr/local/sbin/dailyaidecheck --systemdservice

You might see that I have tried some things to get rid of the mount of
/run/credentials which allows an attacker to hide something in
/run/credentials without aide being able to see it because it gets some
temporary filesystem mounted over that path.

Unfortunately, neither of those tricks have worked, and my
/run/credentials/foo that I created before starting my service remains
undetected.

What do I do to disable the credentials mechanism in my service?


You could use TemporaryFileSystem=/run together with a few BindPaths= 
for the required directories. For example, on my setup the user doesn't 
see all cruft in global /run:

$ ls /run
dbus/  firejail/  systemd/  udev/  user/

See also
https://github.com/systemd/systemd/pull/21748
for some thoughts on tentative new directive PrivateRun= or something 
similar.


-Topi



Greetings
Marc






Re: [systemd-devel] [EXTERNAL] Re: [RFC] systemd-resolved: Send d-bus signal after DNS resolution

2022-08-19 Thread Topi Miettinen

On 18.8.2022 19.39, Suraj Krishnan wrote:

Hi all,

I wanted to revive this thread and provide an update on the status of this 
feature/PR.

The feature adds a mechanism for privileged users to monitor DNS resolutions on 
the system, by adding a new varlink interface that exposes a method for clients 
to subscribe to such notifications. The feature will be off by default. 
Lennart's suggestion to use varlink for this turned out to be quite valuable 
and we've been successfully using this patch for the last few months. We're 
using it to update the firewall (drop by default) based on pre-configured 
allow-list of hostnames. We also leverage nftable's queue feature to address 
race conditions associated with asynchronously updating the network firewall.

I received great feedback from the community on the PR, much of which is 
already incorporated. There are two more suggestions that aren't incorporated 
yet:
1) Add "resolvectl monitor" functionality to provide a built-in way to monitor 
the notifications and try out the feature easily (note that varlink CLI tools are just as 
easy to setup/use)
2) Add a d-bus property when the feature is in use. This allows unprivileged 
clients to be aware that DNS requests are monitored on the system.

At this point, I'm writing to gauge if the devs would be open to accepting this 
patch in its current form, or would like to have 1) and/or 2) incorporated into 
the same PR, or have any concerns about the feature in general.
https://github.com/systemd/systemd/pull/22845


I planned to do something similar with
https://github.com/systemd/systemd/issues/17053
and https://github.com/systemd/systemd/pull/17126
but I didn't find a good solution for updating the firewall by resolved 
itself or indirectly with NFT sets. An external daemon is probably the 
best choice and then varlink interface makes a lot of sense. I don't 
need the monitor or d-bus properties.


For simple resolver clients like systemd-timesyncd, which only needs a 
few names resolved and then it's happy to use the resulting IP addresses 
forever, the firewall doesn't need much managing, perhaps a NFT flush 
when the client exits or restarts. It would be nice to use the filtering 
and firewalling for more complex use cases like browser but then the 
lifetime of the firewall rules and when to drop them isn't very clear to 
me. Identifying the client robustly to the level I want seems to be also 
difficult, the best would be to use a combination of stuff like cgroups, 
UID/GID and SELinux domain of the client. May I ask what's your use case 
for the firewall integration?


-Topi



Thanks
Suraj

-Original Message-
From: Lennart Poettering 
Sent: Wednesday, February 16, 2022 9:53 AM
To: Suraj Krishnan 
Cc: systemd-devel@lists.freedesktop.org; Andre Muezerie 

Subject: [EXTERNAL] Re: [systemd-devel] [RFC] systemd-resolved: Send d-bus 
signal after DNS resolution

[You don't often get email from lenn...@poettering.net. Learn why this is 
important at http://aka.ms/LearnAboutSenderIdentification.]

On Di, 15.02.22 22:37, Suraj Krishnan (sura...@microsoft.com) wrote:


Hello,

I'm reaching out to the community to gather feedback about a feature
to broadcast a d-bus signal notification from systemd-resolved when a
DNS query is completed. The message would contain information about
the query and IP addresses received from the DNS server.


Broadcasting this on the system bus sounds like a bit too heavy. I am sure 
there are setups which will resolve a *lot* of names in a very short time, and 
you'd flood the bus with that. D-Bus is expressly not built for streaming more 
than control data, but if you have a flood of DNS requests it becomes 
substantially more than that.

Also, given that in 99.9%of all cases the broadcast messages would just be 
dropped by the broker because nothig is listening this sounds needlessly 
expensive.

What would make sense is adding a Varlink interface for this however. resolved 
uses varlink anyway it could just build on that. Varlink has the benefit that 
no broker is involved: if noone is listening we wouldn't do anything and not 
have to pay for it. Moreover varlink has no issues with streaming large amounts 
of data. And its easy to secure to ensure nobody unprivileged will see this 
(simply by making the socket have a restrictive access mode).

So yes, i think adding the concept makes a ton of sense. But not via D-Bus, but via 
Varlink. Would love to review/merge a patch that adds that and then exposes this via 
"resolvectl monitor" or so.

Lennart

--
Lennart Poettering, Berlin




Re: [systemd-devel] Q: Start network in chroot?

2022-06-13 Thread Topi Miettinen

On 13.6.2022 11.09, Ulrich Windl wrote:

Hi!

Two questions:
1) Why can't I use "systemctl start network" in a chroot environment (e.g. mounting the 
system from a rescue medium to fix a defective kernel)? When I try I get: "Running in chroot, 
ignoring command 'start'"


From docs/ENVIRONMENT.md:
* `$SYSTEMD_IGNORE_CHROOT=1` — if set, don't check whether being invoked 
in a
  `chroot()` environment. This is particularly relevant for systemctl, 
as it
  will not alter its behaviour for `chroot()` environments if set. 
Normally it
  refrains from talking to PID 1 in such a case; turning most 
operations such

  as `start` into no-ops.  If that's what's explicitly desired, you might
  consider setting `$SYSTEMD_OFFLINE=1`.

-Topi


Re: [systemd-devel] the need for a discoverable sub-volumes specification

2021-11-11 Thread Topi Miettinen

On 11.11.2021 19.27, Lennart Poettering wrote:

On Mi, 10.11.21 10:34, Topi Miettinen (toiwo...@gmail.com) wrote:


Doing this RootDirectory= would make a ton of sense too I guess, but
it's not as obvious there: we'd need to extend the setting a bit I
think to explicitly enable this logic. As opposed to the RootImage=
case (where the logic should be default on) I think any such logic for
RootDirectory= should be opt-in for security reasons because we cannot
safely detect environments where this logic is desirable and discern
them from those where it isn't. In RootImage= we can bind this to the
right GPT partition type being used to mark root file systems that are
arranged for this kind of setup. But in RootDirectory= we have no
concept like that and the stuff inside the image is (unlike a GPT
partition table) clearly untrusted territory, if you follow what I am
babbling.


My images don't have GPT partition tables, they are just raw squashfs file
systems. So I'd prefer a way to identify the version either by contents of
the image (/@auto/ directory), or something external, like name of the image
(/path/to/image/foo.version-X.Y). Either option would be easy to implement
when generating the image or directory.


Hmm, so thinking about this again, I think we might get away with a
check "/@auto/ exists and /usr/ does not". i.e. the second part of the
check removes any ambiguity: since we unified the OS in /usr it's an
excellent way to check if something is or could be an OS tree.

That said: naked squashfs sucks. Always wrap your squashfs in a GPT
wrapper to make things self-descriptive.


It would be an extra step in image making process. I think naming the 
images '*.squashfs' documents them well enough for me.



But if you have several RootDirectories or RootImages available for a
service, what would be the way to tell which ones should be tried if there's
no GPT? They can't all have the same name. I think using a specifier (like
%q) would solve this issue nicely and there wouldn't be a need for /@auto/
in that case.


A specifier is resolved at unit file load time only. It wouldn#t be the
right fit here, since we don#t want to require that the paths
specified in RootDirectory=/RootImage= are already accessible at the
time PID 1 reads/parses the unit file.


Well, that's out then.


What about this: we could entirely independently of the proposal
originally discussed here teach RootDirectory= + RootImage= one magic
trick: if the path specified ends in ".auto.d/" (or so) then we'll not
actually use the dir/image as-is but assume the path refers to a
directory, and we'd pick the newest entry inside it as decided by
strverscmp().

Or in other words, we'd establish the general rule that dirs ending in
".auto.d/" contains versioned resources inside, that we could apply
here and everywhere else where it fits, too.

of course intrdocuing this rule would be kind of a compat breakage
because if anyone happened to have named their dirs like that already
we'd suddenly do weird stuff with it the user might not expect. But I
think I could live with that.

A patch for that should be pretty easy to do, and be very generically
useful. I kinda like it. What do you think?


Yes, that could work. I'd need to rename the LVM VG to 'levy.auto.d' 
(maybe instead create a new VG just for these images) and the directory 
too but that's no problem.


-Topi


Re: [systemd-devel] the need for a discoverable sub-volumes specification

2021-11-10 Thread Topi Miettinen

On 9.11.2021 23.03, Lennart Poettering wrote:

On Di, 09.11.21 19:48, Topi Miettinen (toiwo...@gmail.com) wrote:


i.e. we'd drop the counting suffix.


Could we have this automatic versioning scheme extended also to service
RootImages & RootDirectories as well? If the automatic versioning was also
extended to services, we could have A/B testing also for RootImages with
automatic fallback to last known good working version.


At least in the case of RootImage= this was my implied assumption:
we'd implement the same there, since that uses the exact same code as
systemd-nspawn's image dissection and we definitely want it there.

Doing this RootDirectory= would make a ton of sense too I guess, but
it's not as obvious there: we'd need to extend the setting a bit I
think to explicitly enable this logic. As opposed to the RootImage=
case (where the logic should be default on) I think any such logic for
RootDirectory= should be opt-in for security reasons because we cannot
safely detect environments where this logic is desirable and discern
them from those where it isn't. In RootImage= we can bind this to the
right GPT partition type being used to mark root file systems that are
arranged for this kind of setup. But in RootDirectory= we have no
concept like that and the stuff inside the image is (unlike a GPT
partition table) clearly untrusted territory, if you follow what I am
babbling.


My images don't have GPT partition tables, they are just raw squashfs 
file systems. So I'd prefer a way to identify the version either by 
contents of the image (/@auto/ directory), or something external, like 
name of the image (/path/to/image/foo.version-X.Y). Either option would 
be easy to implement when generating the image or directory.


But if you have several RootDirectories or RootImages available for a 
service, what would be the way to tell which ones should be tried if 
there's no GPT? They can't all have the same name. I think using a 
specifier (like %q) would solve this issue nicely and there wouldn't be 
a need for /@auto/ in that case.



Or in other words: to enable this for RootDirectory= we probably need
a new option RootDirectoryVersioned= or so that takes a boolean.


Wouldn't this be unnecessary, if the version magic would be available 
explicitly as specifier to the path of RootDirectory= or RootImage=? 
Then we know that the configuring user made this decision.


-Topi


Re: [systemd-devel] the need for a discoverable sub-volumes specification

2021-11-09 Thread Topi Miettinen

On 8.11.2021 17.32, Lennart Poettering wrote:

Besides the GPT auto-discovery where versioning is implemented the way
I mentioned, there's also the sd-boot boot loader which does roughly
the same kind of OS versioning with the boot entries it discovers. So
right now, you can already chose whether:

1. you want to do OS versioning on the boot loader entry level: name
your EFI binary fooos-0.1.efi (or fooos-0.1.conf, as defined by the
boot loader spec) and similar and the boot loader automatically
picks it up, makes sense of it and boots the newest version
installed.

2. you want to do OS versioning on the GPT partition table level: name
your partitions "fooos-0.1" and similar, with the right GPT type,
and tools such as systemd-nspawn, systemd-dissect, portable
services, RootImage= in service unit files all will be able to
automatically pick the newest version of the OS among the ones in
the image.

and now:

3. If we implement what I proprose above then you could do OS version
on the file system level too.

(Or you could do a combination of the above, if you want — which is
highly desirable I think in case you want a universal image that can
boot on bare metal and in nspawn in a nice versioned way.)

Now, in sd-boot's versioning logic we implement an automatic boot
assesment logic on top of the OS versioning: if you add a "+x-y"
string into the boot entry name we use it as x=tries-left and
y=tries-done counters. i.e. fooos-0.1+3-0.efi is semantically the same
as fooos-0.1.efi, except that there are 3 attempts left and 0 done
yet. On each boot attempt the boot loader decreases x and increases
y. i.e. fooos-0.1+3-0.efi → fooos-0.1+2-1.efi → fooos-0.1+1-2.efi →
fooos-0.1+0-3.efi. If a boot succeeds the two counters are dropped
from the filename, i.e. → fooos-0.1.efi.

For details see: https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT.

Now, why am I mentioning all this? Right now this assessment counter
logic is only implemented for the OS versioning as implemented by
sd-boot. But I think it would make a ton of sense to implement the
same scheme for the GPT partition table OS versioning, and then also
for the fs-level OS versioning as proposed in this thread.

Or to say this explicitly: we could define the spec to say that if
we encounter:

/@auto/root-x86-64:fedora_36.0+3-0

on first boot attempt we'd rename it:

/@auto/root-x86-64:fedora_36.0+2-1

and so on. Until boot succeeds in which case we'd rename it:

/@auto/root-x86-64:fedora_36.0

i.e. we'd drop the counting suffix.


Could we have this automatic versioning scheme extended also to service 
RootImages & RootDirectories as well? If the automatic versioning was 
also extended to services, we could have A/B testing also for RootImages 
with automatic fallback to last known good working version.


In my setup, all services use either a RootImage= or RootDirectory= (for 
early boot services). Most of them don't care about kernel version, so 
the services use a shared drop-in (LVM logical volume 'levy'):


[Service]
RootImage=/dev/levy/%p-all.squashfs

The device path will then be for example 
/dev/levy/systemd-networkd-all.squashfs.


For udev and systemd-modules, kernel version is used 
(/usr/local/lib/rootimages/systemd-udevd-5.14.0-2-amd64.dir), so the 
services use this drop-in:


[Service]
RootDirectory=/usr/local/lib/rootimages/%p-%v.dir

Instead of (or in addition to) /@auto/ paths inside the RootImage= / 
RootDirectory=, the version could be available as modifier to part of 
device or directory pathname, for example:


[Service]
RootImage=/dev/levy/%p-all-@auto.squashfs

or

[Service]
RootImage=/usr/local/lib/rootimages/%p-%v-@auto.squashfs

Maybe %a instead of @auto.

This would then match 
/dev/levy/systemd-networkd-all-2021-11.09.0.squashfs as the highest 
version, but if that refuses to start, PID1 would try to start 
/dev/levy/systemd-networkd-all-2021-11.08.2.squashfs instead.


-Topi


Re: [systemd-devel] best way to enable dynamicuser on a large custom application

2021-02-12 Thread Topi Miettinen

On 12.2.2021 4.31, Davis Roman wrote:

Hello,

I've been tasked to take a large application mostly written in C which 
had previously always run as root and now run it under dynamic user.


My goal is to follow the "principle of least privilege" and figure out 
all the necessary individual privileges I need to provide so that it 
continues to work normally as before.


I'm sure I can use a trial and error approach that would involve running 
the unprivileged application, inspecting error, granting needed 
privilege, rinse, wash and repeat until all errors are resolved
but I'm wondering if there is a more systematic approach that involves 
inspecting the code base and figuring out all needed privileges needed 
to get the application to work properly?


Code base inspection may not present the complete picture easily since 
libraries may also invoke privileges and they may also call further 
libraries, plugins etc. Test-based approach is not great either, since 
if the test run misses a use case, the privileges for that may be left out.


Once upon time, I tried to automate generating systemd configuration 
from test runs with some help from kernel using SystemTap. It may need 
some updating:

https://github.com/topimiettinen/systemd-settings-generator.git

But a problem with kernel's debugging interfaces is that they don't seem 
to remain very stable and for some interfaces, performance may not be 
good enough. Maybe a more stable and performant model would be to 
intercept the LSM interface directly. This "systemd-lsm" could store the 
state inside the kernel (no need for I/O) and it could produce systemd 
configuration after the service finished with netlink (for speed) or 
/proc interface (for text).


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Antw: [EXT] Re: Creating executable device nodes in /dev?

2020-12-16 Thread Topi Miettinen

On 16.12.2020 12.03, Ulrich Windl wrote:

Jarkko Sakkinen  schrieb am 15.12.2020 um 05:19 in

Nachricht
<20201215041903.ga21...@kernel.org>:

On Mon, Dec 14, 2020 at 08:25:50AM +0100, Ulrich Windl wrote:

Topi Miettinen  schrieb am 11.12.2020 um 12:46 in

Nachricht
<27796c04-249e-6cf0-c3e1-0fd657a82...@gmail.com>:

On 11.12.2020 12.46, Jarkko Sakkinen wrote:

On Wed, Dec 09, 2020 at 10:35:21AM +0200, Topi Miettinen wrote:

On 9.12.2020 2.15, Jarkko Sakkinen wrote:

On Wed, Dec 09, 2020 at 01:15:27AM +0200, Topi Miettinen wrote:

As  a further argument, I just did this on a Fedora system:
$ find /dev ‑perm /ugo+x ‑a \! ‑type d ‑a \! ‑type l
No results.  So making /dev noexec doesn't seem to have any

benefit.


It's no surprise that there aren't any executables in /dev since
removing MAKEDEV ages ago. That's not the issue, which is that
/dev is a writable directory (for UID=0 but no capabilities are
needed) and thus a potential location for constructing unapproved
executables if it is also mounted exec (W^X).


UID 0 can just change mount options, though, unless SELinux or

similar

is

used. And SELinux can protect /dev just fine without noexec.


Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also

SELinux

is not universal and the policies might not contain all users or

services.


‑Topi


What's the data that supports having noexec /dev anyway? With root
access I can then just use something else like /dev/shm mount.

Has there been out in the wild real world cases that noexec mount
of would have prevented?

For me this sounds a lot just something that "feels more secure"
without any measurable benefit. Can you prove me wrong?


I don't think security works that way. An attacker has various methods

to

choose from, some are more interesting than others. The case where

rw,exec

/dev would be interesting would imply that easier or more common

avenues

would be blocked, for example rw,exec /dev/shm, /tmp, /var/tmp, or
/run/user/$UID/ for user. Also fileless malware with pure ROP/JOP

approach

with no need for any file system access is getting more common. It

does

not

mean that it would not be prudent to block the relatively easy

approaches

too, including /dev.


What if we add a new mount option "chrexec", which allows exec
for character devices (S_IFCHR).


I think devices are a bad match for SGX because devices haven't been
executable and SGX is actually an operation for memory. So something
like memfd_create(, MFD_SGX) or mmap(,, PROT_READ|PROT_EXEC|PROT_SGX)
would be much more natural. Even better would be something that
conceptully also works for AMD version (either with the same flags or
MFD_SGX / MFD_whatever_the_AMD_version_is).


+1


SGX reserved memory from kernel's point of view is IO memory.

Mapping SGX to memfd would not be a great idea, as it does not map
into concept of anonymous file backed by regular memory.

A device file is very natural match actually. We have ioctl API for
uploading enclave pages during the build procedure to the enclave and
custom #PF handler. Conceptually it's a lot like video memory or such
special device specific memory area.

There's no AMD equivalent of this technology.


Hi!

Back to "noexec": AFAIR the execute bit does not make sense for device files,
and the purpose probably was to avoid execution of non-device files (e.g.
regular executables) from inside /dev (where they should not be). So in this
view "noexec" makes sense.
There were similar arguments for not allowing device files in user
directories.


PR#17940 (https://github.com/systemd/systemd/pull/17940) was merged, so 
/dev will now on be mounted with "exec" by systemd.


I made issue #17942 (https://github.com/systemd/systemd/issues/17942) to 
discuss related hardening options. I'm leaning towards 
NoExecPaths=/ExecPaths= as it would enable nice hardening by 
allow-listing of all executable content for system services with simple 
directives like:


[Service]
NoExecPaths=/
ExecPaths=/usr/sbin/daemon /lib64/ld-linux-x86-64.so.2 /usr/lib

Then a service infected with malware would not be able to execute a 
shell present in the system or downloaded later, if that was not 
explicitly allowed. /dev would also not have "exec" flag by default, but 
SGX could be allowed with "ExecPaths=/dev/sgx" when needed.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-11 Thread Topi Miettinen

On 11.12.2020 12.46, Jarkko Sakkinen wrote:

On Wed, Dec 09, 2020 at 10:35:21AM +0200, Topi Miettinen wrote:

On 9.12.2020 2.15, Jarkko Sakkinen wrote:

On Wed, Dec 09, 2020 at 01:15:27AM +0200, Topi Miettinen wrote:

As  a further argument, I just did this on a Fedora system:
$ find /dev -perm /ugo+x -a \! -type d -a \! -type l
No results.  So making /dev noexec doesn't seem to have any benefit.


It's no surprise that there aren't any executables in /dev since
removing MAKEDEV ages ago. That's not the issue, which is that
/dev is a writable directory (for UID=0 but no capabilities are
needed) and thus a potential location for constructing unapproved
executables if it is also mounted exec (W^X).


UID 0 can just change mount options, though, unless SELinux or similar is used. 
And SELinux can protect /dev just fine without noexec.


Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also SELinux
is not universal and the policies might not contain all users or services.

-Topi


What's the data that supports having noexec /dev anyway? With root
access I can then just use something else like /dev/shm mount.

Has there been out in the wild real world cases that noexec mount
of would have prevented?

For me this sounds a lot just something that "feels more secure"
without any measurable benefit. Can you prove me wrong?


I don't think security works that way. An attacker has various methods to
choose from, some are more interesting than others. The case where rw,exec
/dev would be interesting would imply that easier or more common avenues
would be blocked, for example rw,exec /dev/shm, /tmp, /var/tmp, or
/run/user/$UID/ for user. Also fileless malware with pure ROP/JOP approach
with no need for any file system access is getting more common. It does not
mean that it would not be prudent to block the relatively easy approaches
too, including /dev.


What if we add a new mount option "chrexec", which allows exec
for character devices (S_IFCHR).


I think devices are a bad match for SGX because devices haven't been 
executable and SGX is actually an operation for memory. So something 
like memfd_create(, MFD_SGX) or mmap(,, PROT_READ|PROT_EXEC|PROT_SGX) 
would be much more natural. Even better would be something that 
conceptully also works for AMD version (either with the same flags or 
MFD_SGX / MFD_whatever_the_AMD_version_is).


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-09 Thread Topi Miettinen

On 9.12.2020 17.14, Andy Lutomirski wrote:



On Dec 9, 2020, at 12:58 AM, Topi Miettinen  wrote:

On 9.12.2020 2.42, Jarkko Sakkinen wrote:

On Wed, Dec 09, 2020 at 02:15:28AM +0200, Jarkko Sakkinen wrote:
On Wed, Dec 09, 2020 at 01:15:27AM +0200, Topi Miettinen wrote:

As  a further argument, I just did this on a Fedora system:
$ find /dev -perm /ugo+x -a \! -type d -a \! -type l
No results.  So making /dev noexec doesn't seem to have any benefit.


It's no surprise that there aren't any executables in /dev since
removing MAKEDEV ages ago. That's not the issue, which is that
/dev is a writable directory (for UID=0 but no capabilities are
needed) and thus a potential location for constructing unapproved
executables if it is also mounted exec (W^X).


UID 0 can just change mount options, though, unless SELinux or similar is used. 
And SELinux can protect /dev just fine without noexec.


Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also SELinux
is not universal and the policies might not contain all users or services.

-Topi


What's the data that supports having noexec /dev anyway? With root
access I can then just use something else like /dev/shm mount.

Has there been out in the wild real world cases that noexec mount
of would have prevented?

Typo: "of" = "of /dev"

For me this sounds a lot just something that "feels more secure"
without any measurable benefit. Can you prove me wrong?

The debate is circled around something not well defined. Of course you
get theoretically more safe system when you decrease priviliges anywhere
in the system. Like you could start do grazy things with stuff that
unprivilged user has access, in order to prevent malware to elevate to
UID 0 in the first place.
I think where this go intellectually wrong is that we are talking about
*default installation* of a distribution. That should have somewhat sane
common sense access control settings. For like a normal desktop user
noexec /dev will not do any possible favor.
Then there is the case when you want to harden installation for an
application, let's' say some server. In that case you will anyway
fine-tune the security settings and go grazy enough with hardening.
When you tailor a server, it's a standard practice to enumerate and
adjust the mount points if needed.


I think we agree that there's a need for either way to allow SGX (if default is 
hardened) or a hardening option (in the opposite case). For systemd I see two 
approaches:

1. Default noexec /dev, override with something like
- ExecPaths=/dev
- MountOptions=/dev:rw,exec,dev,nosuid
- or even MountOptions=/dev/sgx:rw,exec,dev,nosuid
- ProtectDev=no
- AllowSGX=yes

2. Default exec /dev, override with
- NoExecPaths=/dev
- MountOptions=/dev:rw,noexec,dev,nosuid
- ProtectDev=yes
- DenySGX=yes

I'd prefer 1. but of course 2. would be reasonable.


I would argue for 2, for the following reason.  I absolutely agree that 
hardening a system by making it impossible to create executable code 
dynamically is valuable, but I don’t think it’s a good default. By default, 
programs like gcc and clang should work, but so should JITs, and JITs are 
getting more popular and powerful all the time.  In a default setting that 
allows JITs, etc, I see no benefit at all to making /dev noexec. To the 
contrary, making /dev noexec seems like plugging a little restricted corner of 
code creation (because it requires UID=0) while allowing the easy ways (/tmp, 
/home, /dev/shm, unshare(2), mmap(), etc).  By all means let admins harden 
this, but I see no reason to apply some of the hardening when the rest is 
disabled.


Makes sense, especially if anything in theory could be expected to use 
SGX. In practice, probably no system services will at least initially, 
so hardening knobs make also sense.







To summarize, I neither understand the intended target audience.


We have something in common: me neither. What's the target audience for SGX? 
What's the use case? What are the users: browsers, system services? How would 
applications use SGX? Should udev rules for /dev/sgx make it available to any 
logged in users with uaccess tags?




I would certainly like it to be available to all software, with the possible 
exception of extra-hardened systems. Using SGX is not really an interesting 
attack surface. The main threat is that malware might use SGX to make itself 
hard to reverse engineer.


Maybe also malware which can escape all means of detection, enforced by 
the CPU? Though I don't know if any malware scanners for Linux work can 
check for fileless, memory only malware.




In Intel’s original vision, only specially licensed vendors could create SGX 
software, but Linux pushed back against this quite hard, and new CPUs allow 
unlicensed enclaves. So your Skylake CPUs support SGX, but not on Linux.


Kudos to Linux for the push.

-Topi
___
systemd-devel mailing list
systemd-dev

Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-09 Thread Topi Miettinen

On 9.12.2020 2.42, Jarkko Sakkinen wrote:

On Wed, Dec 09, 2020 at 02:15:28AM +0200, Jarkko Sakkinen wrote:

On Wed, Dec 09, 2020 at 01:15:27AM +0200, Topi Miettinen wrote:

As  a further argument, I just did this on a Fedora system:
$ find /dev -perm /ugo+x -a \! -type d -a \! -type l
No results.  So making /dev noexec doesn't seem to have any benefit.


It's no surprise that there aren't any executables in /dev since
removing MAKEDEV ages ago. That's not the issue, which is that
/dev is a writable directory (for UID=0 but no capabilities are
needed) and thus a potential location for constructing unapproved
executables if it is also mounted exec (W^X).


UID 0 can just change mount options, though, unless SELinux or similar is used. 
And SELinux can protect /dev just fine without noexec.


Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also SELinux
is not universal and the policies might not contain all users or services.

-Topi


What's the data that supports having noexec /dev anyway? With root
access I can then just use something else like /dev/shm mount.

Has there been out in the wild real world cases that noexec mount
of would have prevented?


Typo: "of" = "of /dev"


For me this sounds a lot just something that "feels more secure"
without any measurable benefit. Can you prove me wrong?


The debate is circled around something not well defined. Of course you
get theoretically more safe system when you decrease priviliges anywhere
in the system. Like you could start do grazy things with stuff that
unprivilged user has access, in order to prevent malware to elevate to
UID 0 in the first place.

I think where this go intellectually wrong is that we are talking about
*default installation* of a distribution. That should have somewhat sane
common sense access control settings. For like a normal desktop user
noexec /dev will not do any possible favor.

Then there is the case when you want to harden installation for an
application, let's' say some server. In that case you will anyway
fine-tune the security settings and go grazy enough with hardening.
When you tailor a server, it's a standard practice to enumerate and
adjust the mount points if needed.


I think we agree that there's a need for either way to allow SGX (if 
default is hardened) or a hardening option (in the opposite case). For 
systemd I see two approaches:


1. Default noexec /dev, override with something like
- ExecPaths=/dev
- MountOptions=/dev:rw,exec,dev,nosuid
- or even MountOptions=/dev/sgx:rw,exec,dev,nosuid
- ProtectDev=no
- AllowSGX=yes

2. Default exec /dev, override with
- NoExecPaths=/dev
- MountOptions=/dev:rw,noexec,dev,nosuid
- ProtectDev=yes
- DenySGX=yes

I'd prefer 1. but of course 2. would be reasonable.


To summarize, I neither understand the intended target audience.


We have something in common: me neither. What's the target audience for 
SGX? What's the use case? What are the users: browsers, system services? 
How would applications use SGX? Should udev rules for /dev/sgx make it 
available to any logged in users with uaccess tags?


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-09 Thread Topi Miettinen

On 9.12.2020 2.15, Jarkko Sakkinen wrote:

On Wed, Dec 09, 2020 at 01:15:27AM +0200, Topi Miettinen wrote:

As  a further argument, I just did this on a Fedora system:
$ find /dev -perm /ugo+x -a \! -type d -a \! -type l
No results.  So making /dev noexec doesn't seem to have any benefit.


It's no surprise that there aren't any executables in /dev since
removing MAKEDEV ages ago. That's not the issue, which is that
/dev is a writable directory (for UID=0 but no capabilities are
needed) and thus a potential location for constructing unapproved
executables if it is also mounted exec (W^X).


UID 0 can just change mount options, though, unless SELinux or similar is used. 
And SELinux can protect /dev just fine without noexec.


Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also SELinux
is not universal and the policies might not contain all users or services.

-Topi


What's the data that supports having noexec /dev anyway? With root
access I can then just use something else like /dev/shm mount.

Has there been out in the wild real world cases that noexec mount
of would have prevented?

For me this sounds a lot just something that "feels more secure"
without any measurable benefit. Can you prove me wrong?


I don't think security works that way. An attacker has various methods 
to choose from, some are more interesting than others. The case where 
rw,exec /dev would be interesting would imply that easier or more common 
avenues would be blocked, for example rw,exec /dev/shm, /tmp, /var/tmp, 
or /run/user/$UID/ for user. Also fileless malware with pure ROP/JOP 
approach with no need for any file system access is getting more common. 
It does not mean that it would not be prudent to block the relatively 
easy approaches too, including /dev.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-08 Thread Topi Miettinen

On 8.12.2020 23.30, Andy Lutomirski wrote:



On Dec 8, 2020, at 12:45 PM, Topi Miettinen  wrote:

On 8.12.2020 20.07, Andy Lutomirski wrote:

On Thu, Nov 19, 2020 at 10:05 AM Topi Miettinen  wrote:

On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote:

On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote:

Hi udev people-

The upcoming Linux SGX driver has a device node /dev/sgx.  User code
opens it, does various setup things, mmaps it, and needs to be able to
create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
noexec.

Can udev arrange to make a device node executable on distros that make
/dev noexec?  This could be done by bind-mounting from an exec tmpfs.
Alternatively, the kernel could probably learn to ignore noexec on
/dev/sgx, but that seems a little bit evil.


I'd be inclined to simply drop noexec from /dev by default.
We don't do noexec on either /tmp or /dev/shm (because that causes immediate
problems with stuff like Java and cffi). And if you have those two at your
disposal anyway, having noexec on /dev doesn't seem important.


I'd propose to not enable exec globally, but if a service needs SGX, it
could use something like MountOptions=/dev:exec only in those cases
where it's needed. That way it's possible to disallow writable and
executable file systems for most services (which typically don't need
/tmp or /dev/shm either). Of course the opposite
(MountOptions=/dev:noexec) would be also possible, but I'd expect that
this would be needed to be used more often.


I imagine the opposite would be more sensible.  It seems odd to me
that we would want any SGX-using service to require both special mount
options and regular ACL permissions.


How common are thes SGX-using services? Will every service start using it 
without any special measures taken on it's behalf, or perhaps only a special 
SGX control tool needs access? What about unprivileged user applications, do 
they ever want to access SGX? Could something like Widevine deep in a browser 
need to talk to SGX in a DRM scheme?


I honestly don’t know. Widevine is probably some unholy mess of SGX and ME 
crud. But regular user programs may well end up using SGX for little non-evil 
enclaves, e.g. storing their keys securely.  It would be nice if unprivileged 
enclaves just work as long as the use has appropriate permissions on the device 
nodes.


Maybe, it would be also great if the access could be limited to those 
users or services which actually need it, by principle of least privilege.



SGX adoption has been severely hampered by the massive series of recent 
vulnerabilities and by Intel’s silly licensing scheme. The latter won’t be 
supported upstream.




As  a further argument, I just did this on a Fedora system:
$ find /dev -perm /ugo+x -a \! -type d -a \! -type l
No results.  So making /dev noexec doesn't seem to have any benefit.


It's no surprise that there aren't any executables in /dev since removing 
MAKEDEV ages ago. That's not the issue, which is that /dev is a writable 
directory (for UID=0 but no capabilities are needed) and thus a potential 
location for constructing unapproved executables if it is also mounted exec 
(W^X).


UID 0 can just change mount options, though, unless SELinux or similar is used. 
And SELinux can protect /dev just fine without noexec.


Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also 
SELinux is not universal and the policies might not contain all users or 
services.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-08 Thread Topi Miettinen

On 8.12.2020 20.07, Andy Lutomirski wrote:

On Thu, Nov 19, 2020 at 10:05 AM Topi Miettinen  wrote:


On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote:

On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote:

Hi udev people-

The upcoming Linux SGX driver has a device node /dev/sgx.  User code
opens it, does various setup things, mmaps it, and needs to be able to
create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
noexec.

Can udev arrange to make a device node executable on distros that make
/dev noexec?  This could be done by bind-mounting from an exec tmpfs.
Alternatively, the kernel could probably learn to ignore noexec on
/dev/sgx, but that seems a little bit evil.


I'd be inclined to simply drop noexec from /dev by default.
We don't do noexec on either /tmp or /dev/shm (because that causes immediate
problems with stuff like Java and cffi). And if you have those two at your
disposal anyway, having noexec on /dev doesn't seem important.


I'd propose to not enable exec globally, but if a service needs SGX, it
could use something like MountOptions=/dev:exec only in those cases
where it's needed. That way it's possible to disallow writable and
executable file systems for most services (which typically don't need
/tmp or /dev/shm either). Of course the opposite
(MountOptions=/dev:noexec) would be also possible, but I'd expect that
this would be needed to be used more often.



I imagine the opposite would be more sensible.  It seems odd to me
that we would want any SGX-using service to require both special mount
options and regular ACL permissions.


How common are thes SGX-using services? Will every service start using 
it without any special measures taken on it's behalf, or perhaps only a 
special SGX control tool needs access? What about unprivileged user 
applications, do they ever want to access SGX? Could something like 
Widevine deep in a browser need to talk to SGX in a DRM scheme?



As  a further argument, I just did this on a Fedora system:

$ find /dev -perm /ugo+x -a \! -type d -a \! -type l

No results.  So making /dev noexec doesn't seem to have any benefit.


It's no surprise that there aren't any executables in /dev since 
removing MAKEDEV ages ago. That's not the issue, which is that /dev is a 
writable directory (for UID=0 but no capabilities are needed) and thus a 
potential location for constructing unapproved executables if it is also 
mounted exec (W^X).


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-11-19 Thread Topi Miettinen

On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote:

On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote:

Hi udev people-

The upcoming Linux SGX driver has a device node /dev/sgx.  User code
opens it, does various setup things, mmaps it, and needs to be able to
create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
noexec.

Can udev arrange to make a device node executable on distros that make
/dev noexec?  This could be done by bind-mounting from an exec tmpfs.
Alternatively, the kernel could probably learn to ignore noexec on
/dev/sgx, but that seems a little bit evil.


I'd be inclined to simply drop noexec from /dev by default.
We don't do noexec on either /tmp or /dev/shm (because that causes immediate
problems with stuff like Java and cffi). And if you have those two at your
disposal anyway, having noexec on /dev doesn't seem important.


I'd propose to not enable exec globally, but if a service needs SGX, it 
could use something like MountOptions=/dev:exec only in those cases 
where it's needed. That way it's possible to disallow writable and 
executable file systems for most services (which typically don't need 
/tmp or /dev/shm either). Of course the opposite 
(MountOptions=/dev:noexec) would be also possible, but I'd expect that 
this would be needed to be used more often.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-26 Thread Topi Miettinen

On 26.10.2020 18.24, Dave Martin wrote:

On Wed, Oct 21, 2020 at 10:44:46PM -0500, Jeremy Linton via Libc-alpha wrote:

Hi,

There is a problem with glibc+systemd on BTI enabled systems. Systemd
has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
caught by the seccomp filter, resulting in service failures.

So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
This is obviously not desirable.

Various changes have been suggested, replacing the mprotect with mmap calls
having PROT_BTI set on the original mapping, re-mmapping the segments,
implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
and various modification to seccomp to allow particular mprotect cases to
bypass the filters. In each case there seems to be an undesirable attribute
to the solution.

So, whats the best solution?


Unrolling this discussion a bit, this problem comes from a few sources:

1) systemd is trying to implement a policy that doesn't fit SECCOMP
syscall filtering very well.

2) The program is trying to do something not expressible through the
syscall interface: really the intent is to set PROT_BTI on the page,
with no intent to set PROT_EXEC on any page that didn't already have it
set.


This limitation of mprotect() was known when I originally added PROT_BTI,
but at that time we weren't aware of a clear use case that would fail.


Would it now help to add something like:

int mchangeprot(void *addr, size_t len, int old_flags, int new_flags)
{
int ret = -EINVAL;
mmap_write_lock(current->mm);
if (all vmas in [addr .. addr + len) have
their mprotect flags set to old_flags) {

ret = mprotect(addr, len, new_flags);
}

mmap_write_unlock(current->mm);
return ret;
}


libc would now be able to do

mchangeprot(addr, len, PROT_EXEC | PROT_READ,
PROT_EXEC | PROT_READ | PROT_BTI);

while systemd's MDWX filter would reject the call if

(new_flags & PROT_EXEC) &&
(!(old_flags & PROT_EXEC) || (new_flags & PROT_WRITE)



This won't magically fix current code, but something along these lines
might be better going forward.


Thoughts?


Looks good to me.

-Topi

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-26 Thread Topi Miettinen

On 26.10.2020 16.52, Catalin Marinas wrote:

On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote:

On 23.10.2020 12.02, Catalin Marinas wrote:

On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:

Regardless, it makes sense to me to have the kernel load the executable
itself with BTI enabled by default. I prefer gaining Catalin's suggested
patch[2]. :)

[...]

[2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/


I think I first heard the idea at Mark R ;).

It still needs glibc changes to avoid the mprotect(), or at least ignore
the error. Since this is an ABI change and we don't know which kernels
would have it backported, maybe better to still issue the mprotect() but
ignore the failure.


What about kernel adding an auxiliary vector as a flag to indicate that BTI
is supported and recommended by the kernel? Then dynamic loader could use
that to detect that a) the main executable is BTI protected and there's no
need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC
pages.


We could add a bit to AT_FLAGS, it's always been 0 for Linux.


Great!


In absence of the vector, the dynamic loader might choose to skip doing
PROT_BTI at all (since the main executable isn't protected anyway either, or
maybe even the kernel is up-to-date but it knows that it's not recommended
for some reason, or maybe the kernel is so ancient that it doesn't know
about BTI). Optionally it could still read the flag from ELF later (for
compatibility with old kernels) and then do the mprotect() dance, which may
trip seccomp filters, possibly fatally.


I think the safest is for the dynamic loader to issue an mprotect() and
ignore the EPERM error. Not all user deployments have this seccomp
filter, so they can still benefit, and user can't tell whether the
kernel change has been backported.


But the seccomp filter can be set to kill the process, so that's 
definitely not the safest way. I think safest is that when the AT_FLAGS 
bit is seen, ld.so doesn't do any mprotect() calls but instead when 
mapping the segments, mmap() flags are adjusted to include PROT_BTI, so 
mprotect() calls are not necessary. If there's no seccomp filter, 
there's no disadvantage for avoiding the useless mprotect() calls.


I'd expect the backported kernel change to include both aux vector and 
also using PROT_BTI for the main executable. Then the logic would work 
with backported kernels as well.


If there's no aux vector, all bets are off. The kernel could be old and 
unpatched, even so old that PROT_BTI is not known. Perhaps also in the 
future there may be new technologies which have replaced BTI and the 
kernel could want a previous generation ld.so not to try to use BTI, so 
this could be also indicated with the lack of aux vector. The dynamic 
loader could still attempt to mprotect() the pages, but that could be 
fatal. Getting to the point where the error can be ignored means that 
there's no seccomp filter, at least none set to kill. Perhaps the pain 
is only temporary, new or patched kernels should eventually replace the 
old versions.



Now, if the dynamic loader silently ignores the mprotect() failure on
the main executable, is there much value in exposing a flag in the aux
vectors? It saves a few (one?) mprotect() calls but I don't think it
matters much. Anyway, I don't mind the flag.


Saving a few system calls is indeed not an issue, but not being able to 
use MDWX and PROT_BTI simultaneously was the original problem (service 
failures).


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-24 Thread Topi Miettinen

On 23.10.2020 20.52, Salvatore Mesoraca wrote:

Hi,

On Thu, 22 Oct 2020 at 23:24, Topi Miettinen  wrote:

SARA looks interesting. What is missing is a prctl() to enable all W^X
protections irrevocably for the current process, then systemd could
enable it for services with MemoryDenyWriteExecute=yes.


SARA actually has a procattr[0] interface to do just that.
There is also a library[1] to help using it.


That means that /proc has to be available and writable at that point, so 
setting up procattrs has to be done before mount namespaces are set up. 
In general, it would be nice for sandboxing facilities in kernel if 
there would be a way to start enforcing restrictions only at next 
execve(), like setexeccon() for SELinux and aa_change_onexec() for 
AppArmor. Otherwise the exact order of setting up various sandboxing 
options can be very tricky to arrange correctly, since each option may 
have a subtle effect to the sandboxing features enabled later. In case 
of SARA, the operations done between shuffling the mount namespace and 
before execve() shouldn't be affected so it isn't important. Even if it 
did (a new sandboxing feature in the future would need trampolines or 
JIT code generation), maybe the procattr file could be opened early but 
it could be written closer to execve().


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-24 Thread Topi Miettinen

On 23.10.2020 12.02, Catalin Marinas wrote:

On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:

Regardless, it makes sense to me to have the kernel load the executable
itself with BTI enabled by default. I prefer gaining Catalin's suggested
patch[2]. :)

[...]

[2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/


I think I first heard the idea at Mark R ;).

It still needs glibc changes to avoid the mprotect(), or at least ignore
the error. Since this is an ABI change and we don't know which kernels
would have it backported, maybe better to still issue the mprotect() but
ignore the failure.


What about kernel adding an auxiliary vector as a flag to indicate that 
BTI is supported and recommended by the kernel? Then dynamic loader 
could use that to detect that a) the main executable is BTI protected 
and there's no need to mprotect() it and b) PROT_BTI flag should be 
added to all PROT_EXEC pages.


In absence of the vector, the dynamic loader might choose to skip doing 
PROT_BTI at all (since the main executable isn't protected anyway 
either, or maybe even the kernel is up-to-date but it knows that it's 
not recommended for some reason, or maybe the kernel is so ancient that 
it doesn't know about BTI). Optionally it could still read the flag from 
ELF later (for compatibility with old kernels) and then do the 
mprotect() dance, which may trip seccomp filters, possibly fatally.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Topi Miettinen

On 22.10.2020 23.02, Kees Cook wrote:

On Thu, Oct 22, 2020 at 01:39:07PM +0300, Topi Miettinen wrote:

But I think SELinux has a more complete solution (execmem) which can track
the pages better than is possible with seccomp solution which has a very
narrow field of view. Maybe this facility could be made available to
non-SELinux systems, for example with prctl()? Then the in-kernel MDWX could
allow mprotect(PROT_EXEC | PROT_BTI) in case the backing file hasn't been
modified, the source filesystem isn't writable for the calling process and
the file descriptor isn't created with memfd_create().


Right. The problem here is that systemd is attempting to mediate a
state change using only syscall details (i.e. with seccomp) instead of
a stateful analysis. Using a MAC is likely the only sane way to do that.
SELinux is a bit difficult to adjust "on the fly" the way systemd would
like to do things, and the more dynamic approach seen with SARA[1] isn't
yet in the kernel.


SARA looks interesting. What is missing is a prctl() to enable all W^X 
protections irrevocably for the current process, then systemd could 
enable it for services with MemoryDenyWriteExecute=yes.


I didn't also see specific measures against memfd_create() or file 
system W, but perhaps those can be added later. Maybe pkey_mprotect() 
is not handled either unless it uses the same LSM hook as mprotect().



Trying to enforce memory W^X protection correctly
via seccomp isn't really going to work well, as far as I can see.


Not in general, but I think it can work well in context of system 
services. Then you can ensure that for a specific service, 
memfd_create() is blocked by seccomp and the file systems are W^X 
because of mount namespaces etc., so there should not be any means to 
construct arbitrary executable pages.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Topi Miettinen

On 22.10.2020 10.54, Szabolcs Nagy wrote:

The 10/21/2020 22:44, Jeremy Linton wrote:

There is a problem with glibc+systemd on BTI enabled systems. Systemd
has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
caught by the seccomp filter, resulting in service failures.

So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
This is obviously not desirable.

Various changes have been suggested, replacing the mprotect with mmap calls
having PROT_BTI set on the original mapping, re-mmapping the segments,
implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
and various modification to seccomp to allow particular mprotect cases to
bypass the filters. In each case there seems to be an undesirable attribute
to the solution.

So, whats the best solution?


the easy fix in glibc is to ignore mprotect(PROT_BTI|PROT_EXEC)
failures, so programs work with seccomp filters, but bti gets
disabled (it's unreasonable to expect bti protection if mprotect
is filtered). it will be a nasty silent failure though.


Some may also want to use seccomp filters so that they will immediately 
kill the process and in this case they couldn't do it.



and i'm also considering a fix that re-mmaps the executable
segment with PROT_BTI instead of mprotect since that is not
filtered. unfortunately the main exe is mmaped by the kernel
without PROT_BTI and the libc does not have the fd to re-mmap.
(bti can be left off for the main exe if mprotect fails and
later we can teach the kernel to add bti there.) currently
this is not a complete fix so i'm a bit hesitant about it.

as for a kernel side fix: if there is a way to only filter
PROT_EXEC mprotect on mappings that are not yet PROT_EXEC
that would solve this problem (but likely needs new syscall
or seccomp capability).


Problem with seccomp MDWX is that it's still possible for malicious 
programs to circumvent the filter by using memfd_create(), fill the 
memory with desired content and then use mmap(,,PROT_EXEC) to make it 
executable without triggering seccomp. This can be mitigated by 
filtering also memfd_create(), but then some programs want to use it. 
Also the protection can be bypassed if the program can write to a file 
system which isn't mounted with "noexec". This can be mitigated with 
private mount namespaces and global mount options, but again some 
programs are written to expect W & X.


But I think SELinux has a more complete solution (execmem) which can 
track the pages better than is possible with seccomp solution which has 
a very narrow field of view. Maybe this facility could be made available 
to non-SELinux systems, for example with prctl()? Then the in-kernel 
MDWX could allow mprotect(PROT_EXEC | PROT_BTI) in case the backing file 
hasn't been modified, the source filesystem isn't writable for the 
calling process and the file descriptor isn't created with memfd_create().


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Topi Miettinen

On 22.10.2020 12.31, Catalin Marinas wrote:

On Thu, Oct 22, 2020 at 10:38:23AM +0200, Lennart Poettering wrote:

On Do, 22.10.20 09:29, Szabolcs Nagy (szabolcs.n...@arm.com) wrote:

The dynamic loader has to process the LOAD segments to get to the ELF
note that says to enable BTI.  Maybe we could do a first pass and load
only the segments that cover notes.  But that requires lots of changes
to generic code in the loader.


What if the loader always enabled BTI for PROT_EXEC pages, but then when
discovering that this was a mistake, mprotect() the pages without BTI? Then
both BTI and MDWX would work and the penalty of not getting MDWX would fall
to non-BTI programs. What's the expected proportion of BTI enabled code vs.
disabled in the future, is it perhaps expected that a distro would enable
the flag globally so eventually only a few legacy programs might be
unprotected?


i thought mprotect(PROT_EXEC) would get filtered
with or without bti, is that not the case?


We can adjust the filter in systemd to match any combination of
flags to allow and to deny.


Yes but Szabolcs' point to Topi was that if we can adjust the filters to
allow mprotect(PROT_EXEC), why not allow mprotect(PROT_EXEC|PROT_BTI)
instead? Anyway, I see the MDWX and BTI as complementary policies so
ideally we shouldn't have to choose between one or the other. If we
allow mprotect(PROT_EXEC), that would override MDWX and also disable
BTI.


Allowing mprotect(PROT_EXEC|PROT_BTI) would mean that all you need to 
circumvent MDWX is to add PROT_BTI flag. I'd suggest getting the flags 
right at mmap() time or failing that, reverting the PROT_BTI for legacy 
programs later.


Could the kernel tell the loader of the BTI situation with auxiliary 
vectors? Then it would be easy for the loader to always use the best 
mmap() flags without ever needing to mprotect().


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Topi Miettinen

On 22.10.2020 11.29, Szabolcs Nagy wrote:

The 10/22/2020 11:17, Topi Miettinen via Libc-alpha wrote:

On 22.10.2020 10.54, Florian Weimer wrote:

* Lennart Poettering:

Did you see Topi's comments on the systemd issue?

https://github.com/systemd/systemd/issues/17368#issuecomment-710485532

I think I agree with this: it's a bit weird to alter the bits after
the fact. Can't glibc set up everything right from the begining? That
would keep both concepts working.


The dynamic loader has to process the LOAD segments to get to the ELF
note that says to enable BTI.  Maybe we could do a first pass and load
only the segments that cover notes.  But that requires lots of changes
to generic code in the loader.


What if the loader always enabled BTI for PROT_EXEC pages, but then when
discovering that this was a mistake, mprotect() the pages without BTI? Then
both BTI and MDWX would work and the penalty of not getting MDWX would fall
to non-BTI programs. What's the expected proportion of BTI enabled code vs.
disabled in the future, is it perhaps expected that a distro would enable
the flag globally so eventually only a few legacy programs might be
unprotected?


i thought mprotect(PROT_EXEC) would get filtered
with or without bti, is that not the case?


It would be filtered, but the idea is that with modern binaries this 
would not happen since the pages would be mapped with mmap(,, PROT_EXEC 
| PROT_BTI,,) which is OK for purposes MDWX. The loader would have to 
use mprotect(PROT_EXEC) to get rid of PROT_BTI only for the legacy binaries.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Topi Miettinen

On 22.10.2020 10.54, Florian Weimer wrote:

* Lennart Poettering:


On Mi, 21.10.20 22:44, Jeremy Linton (jeremy.lin...@arm.com) wrote:


Hi,

There is a problem with glibc+systemd on BTI enabled systems. Systemd
has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
caught by the seccomp filter, resulting in service failures.

So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
This is obviously not desirable.

Various changes have been suggested, replacing the mprotect with mmap calls
having PROT_BTI set on the original mapping, re-mmapping the segments,
implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
and various modification to seccomp to allow particular mprotect cases to
bypass the filters. In each case there seems to be an undesirable attribute
to the solution.

So, whats the best solution?


Did you see Topi's comments on the systemd issue?

https://github.com/systemd/systemd/issues/17368#issuecomment-710485532

I think I agree with this: it's a bit weird to alter the bits after
the fact. Can't glibc set up everything right from the begining? That
would keep both concepts working.


The dynamic loader has to process the LOAD segments to get to the ELF
note that says to enable BTI.  Maybe we could do a first pass and load
only the segments that cover notes.  But that requires lots of changes
to generic code in the loader.


What if the loader always enabled BTI for PROT_EXEC pages, but then when 
discovering that this was a mistake, mprotect() the pages without BTI? 
Then both BTI and MDWX would work and the penalty of not getting MDWX 
would fall to non-BTI programs. What's the expected proportion of BTI 
enabled code vs. disabled in the future, is it perhaps expected that a 
distro would enable the flag globally so eventually only a few legacy 
programs might be unprotected?


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seccomp allow/log action

2020-07-08 Thread Topi Miettinen

On 8.7.2020 17.47, Chris PeBenito wrote:
I would like to implement a unit option that would make the seccomp 
action SCMP_ACT_LOG so that I can test SystemCallFilter settings without 
killing the services, like SELinux permissive mode.


I was reading this github issue about seccomp actions from last year:

https://github.com/systemd/systemd/issues/11967

While it mentioned other actions, it was mainly about changing the kill 
action to kill the process rather than just the offending thread.  There 
wasn't a solid conclusion about how allow/log actions would work in 
terms of unit options.  I figure one option is adding a new option like 
SystemCallFillterAllow=bool that would conflict with 
SystemCallFilterErrno. If true it would set SCMP_ACT_LOG for the 
action.  Having a setting for SCMP_ACT_ALLOW seems redundant since it's 
equivalent to commenting out the SystemCallFilter option since there's 
no logging.


I think it would be more flexible to extend the error code return per 
system call, like

SystemCallFilter=gettimeofday:LOG

For global error action, I'd propose SystemCallErrorNumber= to be 
superseded by more generic


SystemCallErrorAction= KILL | LOG | errno code

-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SECLABEL issue into udev

2020-03-10 Thread Topi Miettinen
On 10.3.2020 11.59, Valerii Chernous -X (vchernou - GLOBALLOGIC INC at 
Cisco) wrote:

Hi Team,
I send this email again because don't receive answer on previous message.

I have issue with SECLABEL into systemd udevadm 243 and I see that 
mainline also have this issue.

It look like Yu forgot initialize data into commit:
25de7aa7b90 (Yu Watanabe 2019-04-25 01:21:11 +0200 924)

If I add something like:
SECLABEL{selinux}="some info"
to udev rule I got a SIGSEGV into udevadm into this rule.
On my opinion next one line patch can fix this issue:

diff --git a/src/udev/udev-rules.c b/src/udev/udev-rules.c
index b9b350d1ef..e1e8273468 100644
--- a/src/udev/udev-rules.c
+++ b/src/udev/udev-rules.c
@@ -921,7 +921,7 @@ static int parse_token(UdevRules *rules, const char 
*key, char *attr, UdevRuleOp

  op = OP_ASSIGN;
  }

-    r = rule_line_add_token(rule_line, TK_A_SECLABEL, op, 
value, NULL);
+    r = rule_line_add_token(rule_line, TK_A_SECLABEL, op, 
value, attr);


Looks good to me, but please make a pull request.

-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Failed to determine supported controllers: No such process

2020-01-21 Thread Topi Miettinen

On 21.1.2020 18.11, Lennart Poettering wrote:

On Di, 21.01.20 17:02, Reindl Harald (h.rei...@thelounge.net) wrote:


i was already considering "hidepid" as the root cause
systemd-243.5-1.fc31.x86_64 other than
systemd-241-12.git323cdf4.fc30.x86_64 can no longer deal with


hidepid= is broken. We don't support that, sorry. Various components
can't deal with it.

We'd like to support this, but the way it is defined now cannot work
for general purpose OS.

There was work on fixing it in the kernel:

https://lwn.net/Articles/738597/

That patch set got discussed, then forgotten, picked up again and
forgotten. it's a neverending story. If it eventualls gets merged we
can happily support it in systemd.


Alexey Gladkov submitted v6 of the patch series recently:

https://lkml.org/lkml/2019/12/25/102

-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd unit file to remount /home /tmp /dev/shm /run with nosuid, nodev

2020-01-02 Thread Topi Miettinen

On 2.1.2020 21.08, Josh Triplett wrote:

Lennart Poettering wrote:

And noexec doesn't really make much sense for these dirs, as this
blocks mmap() with MAP_EXEC and there are plenty apps that want to use
that. Moreover "noexec" is at best a protection against accidental
execution and not a security mechanism since it is trivially easy to
circumvent (just call the interpreter directly with the file to
execute as first arg, which for ELF means "/lib64/ld-linux-x86-64.so.2 $BINARY")


That workaround doesn't actually work anymore; the former (blocking mmap
with MAP_EXEC) exists specifically to protect against the latter
(running the interpreter directly).

$ mount | grep '/run '
tmpfs on /run type tmpfs 
(rw,nosuid,nodev,noexec,relatime,size=1620848k,mode=755)
$ sudo cp -a /bin/ls /run/ls
$ /run/ls
bash: /run/ls: Permission denied
(126) $ /lib64/ld-linux-x86-64.so.2 /run/ls
/run/ls: error while loading shared libraries: /run/ls: failed to map segment 
from shared object
(127) $

It's theoretically possible to work around *that* if you have permission
to run arbitrary code and to remap memory from write to execute (both of
which might also be locked down). But even without that, mount -o noexec
does meaningfully improve security, and the trivial workaround no longer
works.


It's not theoretical at all. You can use memfd_create() to prepare a 
file without it ever going to disk and then use 
mmap(,,PROT_EXEC|PROT_READ,,) to put it into address space for execution.


Seccomp can be used to block memfd_create() but then some programs want 
to use it, so it can't be blocked globally. SELinux PROCESS__EXECMEM 
check may also be able to stop this, I haven't tried.


But then many interpreters (designed as such or accidentally 
Turing-complete decoders) can also be used to go around all these file 
system level checks.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] sd-boot kickstart

2019-10-02 Thread Topi Miettinen

On 2.10.2019 9.05, Chris Murphy wrote:

On Tue, Oct 1, 2019 at 1:05 AM Damian Ivanov  wrote:


Hello,

I watched the video and presentation
https://cfp.all-systems-go.io/media/sdboot-asg2019.pdf
I could not agree more! Anaconda/Kickstart install grub as the
bootloader. Is there some hidden option to use sd-boot instead or is
it necessary to install sd-boot manually after the OS is deployed?


I only see extlinux as an alternative to grub:
https://pykickstart.readthedocs.io/en/latest/


I prefer rEFInd for EFI boot. It does not seem to break as easily as 
grub or sd-boot when something changes, also I think configuration and 
installation is more logical. It's supported in Debian, probably not for 
first installation.


http://www.rodsbooks.com/refind/

-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] keyrings and dbus

2019-06-13 Thread Topi Miettinen

On 13.6.2019 20.52, Simon McVittie wrote:

On Thu, 13 Jun 2019 at 15:43:36 +0300, Topi Miettinen wrote:

The sessions with slightly different scopes might be useful in some cases.
But if this is not the case, would it be possible to unify the scopes and
make systemd --user part of the login session?


I don't think so. Consider these two scenarios, which I hope you'll agree
should both be allowed:

* ssh user@mymachine
* with the ssh session still open, log in to gdm on mymachine as user

* log in to gdm on mymachine as user
* with the X11 or Wayland session still open, ssh user@mymachine

If systemd --user is part of a login session, then in each case it would
have to be started as a child process of the first way you logged in.
This would result in your dbus-daemon --session and your
gnome-terminal-server belonging to your ssh login session in the first
scenario, and your graphical login session in the second (even though
in both cases, gnome-terminal-server is drawing windows onto your
graphical login session).

It gets even weirder if you log out from the first login session, leaving
the second one logged in, and the long-running systemd --user and
dbus-daemon --session as members of a login session that no longer exists.

The "user-session" concept is primarily useful when login sessions overlap
like this: typically you'd have 0-1 graphical login sessions (gdm, etc.),
0 or more remote login sessions (ssh, etc.), 0 or more login sessions on
a virtual console or serial console (getty/login) and 0 or more cron jobs.


These are valid cases. But I think the ssh session would not actually 
need most of the services launched by systemd --user, like 
gnome-terminal-server in your example.


Perhaps the answer is then not to use systemd --user, but my motivation 
to maximise use of systemd is that then I can use its containment 
features, like seccomp easily and tuned for each process, like 
pulseaudio and redshift. For the ssh login session, these would not be 
started at all (ideally) as they are useless in that login session.



Or the reverse, start the login session by systemd --user?


systemd --user is unprivileged and does not provide a transition from
not-logged-in to logged-in state (it isn't in the same position as login,
sshd, gdm, cron etc.), so it cannot start login sessions.


I meant that gdm would do the privileged transition and then it would 
just start unprivileged systemd --user, which would launch rest of X11 
setup.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] keyrings and dbus

2019-06-13 Thread Topi Miettinen

On 12.6.2019 22.20, Simon McVittie wrote:

On Wed, 12 Jun 2019 at 19:57:39 +0300, Andrei Borzenkov wrote:

12.06.2019 19:18, Simon McVittie пишет:

systemd user services are not part of a particular login session. They
exist outside all login sessions (look at systemd-cgls).


gnome-terminal surely *is* part of particular login session.
Unfortunately it is spawned by gnome-terminal-server which itself is
started as systemd user session inside sytsemd-user ... session?


The gnome-terminal launcher stub (/usr/bin/gnome-terminal) is part of
a particular login session, but does not persist beyond startup. All it
does is to communicate with gnome-terminal-server and then exit.

gnome-terminal-server exists outside the scope of any particular login
session, and is the component that actually draws windows and runs
a shell.

This has the consequence that GNOME Terminal windows, and the shell that
runs inside them, are drawn onto the X11 or Wayland display that belongs
to a login session, but are not actually part of that login session. You
can see this by looking at the process and cgroup hierarchies.


It's your choice - but you can't have it both ways. The dbus-daemon
can't be both inside a login session, and shared between login sessions.


But that means that as long as there are applications that are part of
user session and started by dbus-daemon, dbus-daemon simply cannot be
part of systemd-user, correct?


I'm very deliberately using the term "login session" because it is a
jargon term identifying a specific thing in the conceptual model used
by systemd-logind, which is essentially the same as the conceptual model
used by non-systemd things like PAM and Linux audit.

A process enters a login session in privileged code (like login(1) or gdm
or similar) at the transition between "not logged in" and "logged in". All
children of a process that is in a login session are in the same login
session; if a process that is not in any login session starts a child
process, the child is also not in any login session, unless the parent
process is privileged and is deliberately starting a new login session.
In systemd, a login session is represented by the session-$N.scope cgroup,
where $N is the session ID.

"User session" means something different: in D-Bus, we use it to refer
to the collection of processes belonging to a user who currently has
at least one login session. It includes all their login sessions (even
if there is more than one), as well as things that belong to that user
but are not in a login session, like systemd --user. In systemd, a user
session is represented by the user-$UID.slice cgroup, where $UID is the
user's numeric uid.

systemd --user is not part of a login session. If you are starting
dbus-daemon --session as a child process of systemd --user, then the
definitions above mean that the dbus-daemon is not part of a login
session either.


The sessions with slightly different scopes might be useful in some 
cases. But if this is not the case, would it be possible to unify the 
scopes and make systemd --user part of the login session? Now it is 
started by PID1, but in that case it probably should be started by 
pam_systemd.so or more directly (e.g. /etc/X11/Xsession.d/00systemd)? Or 
the reverse, start the login session by systemd --user? I've tried 
lightly both approaches but there always seems to be some problems.


-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] user service not starting on login

2019-05-12 Thread Topi Miettinen

On 12.5.2019 18.08, Matt Zagrabelny wrote:

Hey Mantas and others,

On Thu, May 9, 2019 at 11:57 PM Mantas Mikulėnas > wrote:


On Fri, May 10, 2019 at 5:22 AM Matt Zagrabelny mailto:mzagr...@d.umn.edu>> wrote:

Greetings,

I am attempting to get a user service running on my session login.

The unit is called jack. I've enabled it via:

$ systemctl --user enable jack

When I boot up the system and log in I see that it is inactive.
I can start it manually without issue:

$ systemctl --user status jack
● jack.service - JACK 2
    Loaded: loaded
(/home/theophilus/.config/systemd/user/jack.service; enabled;
vendor preset: enabled)
    Active: inactive (dead)

$ journalctl --user -u jack -b
-- Logs begin at Thu 2019-05-09 20:54:31 CDT, end at Thu
2019-05-09 21:13:53 CDT. --
-- No entries --

$ systemctl --user cat jack
# /home/theophilus/.config/systemd/user/jack.service
[Unit]
Description=JACK 2
Before=sound.target
Before=pulseaudio.service
Requires=dbus.socket

[Service]
Type=dbus
BusName=org.jackaudio.Controller


Among other things, the bus name seems to be incorrect. In
jack2-dbus the only claimed name appears to be "org.jackaudio.service".

ExecStart=/usr/bin/jack_control start


The jack_control program does not spawn nor directly execute the
actual jackd daemon. Instead it *remotely*
activates org.jackaudio.service through D-Bus (you'll see jackdbus
under the dbus.service cgroup), then sends it a single method call
and exits.

In other words, jack_control is not a Type=dbus service, it's a
oneshot script that controls another Type=dbus service. You can
imagine that it's just a wrapper around `dbus-send` or `gdbus call`,
and is something you'd instead use in JACK's *ExecStartPost=*.

A direct conversion of jackdbus to a systemd service would look like
this – because of the way jackdbus is written, it always needs that
extra command to be sent over D-Bus (either by running `jack_control
start` or by using the manual tools):

(~/.config/systemd/user/jack.service)
[Service]
Type=dbus
BusName=org.jackaudio.service
ExecStart=/usr/bin/jackdbus auto
#ExecStartPost=/usr/bin/jack_control start
#ExecStartPost=/usr/bin/gdbus call -e -d org.jackaudio.service -o
/org/jackaudio/Controller -m org.jackaudio.JackControl.StartServer
ExecStartPost=/usr/bin/busctl call --user org.jackaudio.service
/org/jackaudio/Controller org.jackaudio.JackControl StartServer


I used this service file (thank you for providing it!), but it seems 
jack is still not starting when the user logs in to the session:


$ systemctl --user status jack
● jack.service
    Loaded: loaded (/home/theophilus/.config/systemd/user/jack.service; 
enabled; vendor preset: enabled)

    Active: inactive (dead)

$ ps -ef | grep jack
theophi+  1200  1183  0 09:30 pts/3    00:00:00 grep jack

$ systemctl --user status jack
● jack.service
    Loaded: loaded (/home/theophilus/.config/systemd/user/jack.service; 
enabled; vendor preset: enabled)

    Active: inactive (dead)

$ systemctl --user cat jack
# /home/theophilus/.config/systemd/user/jack.service
[Service]
Type=dbus
BusName=org.jackaudio.service
ExecStart=/usr/bin/jackdbus auto
#ExecStartPost=/usr/bin/jack_control start
#ExecStartPost=/usr/bin/gdbus call -e -d org.jackaudio.service -o 
/org/jackaudio/Controller -m org.jackaudio.JackControl.StartServer
ExecStartPost=/usr/bin/busctl call --user org.jackaudio.service 
/org/jackaudio/Controller org.jackaudio.JackControl StartServer


You are missing:

[Install]
WantedBy=default.target

Then when you enable it, systemd should create a symbolic link in directory
.config/systemd/user/default.target.wants
to your service.

-Topi



$ systemctl --user is-enabled jack
enabled

$ systemctl --user is-active jack
inactive

$ systemctl --user | grep jack

$ systemctl --user | grep bus
at-spi-dbus-bus.service  
                              loaded active running   Accessibility 
services bus
dbus.service
                               loaded active running   D-Bus User 
Message Bus
dbus.socket  
                              loaded active running   D-Bus User Message 
Bus Socket


If I start ardour (which makes use of jack) it (ardour) tells me jack is 
not running.


I can start the jack service:

$ systemctl --user start jack

So...

How can I figure out why it is not starting automatically?

Thanks!

-m

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org

Re: [systemd-devel] systemd-journald.service not using ProtectSystem=strict?

2019-01-11 Thread Topi Miettinen

Hello,

I have no problems using this with Debian testing:

# /etc/systemd/system/systemd-journald.service.d/override.conf
[Service]
CapabilityBoundingSet=~CAP_MAC_OVERRIDE CAP_SYS_PTRACE
InaccessiblePaths=-/dev/pts -/dev/shm -/dev/mqueue -/dev/hugepages 
-/setuid -/boot -/tmp -/var/tmp -/bin -/sbin -/usr/bin -/run/lock 
-/lost+found -/media -/mnt -/opt -/srv -/proc/bus

Personality=x86-64
PrivateNetwork=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectSystem=strict
ReadOnlyPaths=-/
ReadWritePaths=-/var -/run
SystemCallFilter=~@resources
UMask=0077

It could be obvious, but when testing journald configuration, if 
journald is started from initramfs, the config file should be also 
present there.


-Topi

On 10.1.2019 20.06, Reindl Harald wrote:

looking at the current security issues and  how it triggers the
troll-army i wonder why systemd-journald.service  is not restricted from
at least write to /usr and /root at least on Fedora 28 (that it's not
vulernable because of compiler hardening is just luck)

[root@testserver:~]$ cat
/etc/systemd/system/systemd-journald.service.d/security.conf
[Service]
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/run
ReadWritePaths=/var

[root@testserver:~]$ systemctl status systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service;
static; vendor preset: disabled)
   Drop-In: /etc/systemd/system/systemd-journald.service.d
└─security.conf
Active: active (running) since Thu 2019-01-10 19:00:30 CET; 42s ago
  Docs: man:systemd-journald.service(8)
man:journald.conf(5)
  Main PID: 398 (systemd-journal)
Status: "Processing requests..."
 Tasks: 1 (limit: 512)
Memory: 4.7M
CGroup: /system.slice/systemd-journald.service
└─398 /usr/lib/systemd/systemd-journald

Jan 10 19:00:30 testserver.rhsoft.net systemd-journald[398]: Journal started
Jan 10 19:00:30 testserver.rhsoft.net systemd-journald[398]: Runtime
journal (/run/log/journal/b3591cfc6c4e65ea231a7d08489dc40f) is 2.5M, max
10.0M, 7.5M free.
Jan 10 19:00:30 testserver.rhsoft.net systemd-journald[398]: Runtime
journal (/run/log/journal/b3591cfc6c4e65ea231a7d08489dc40f) is 2.5M, max
10.0M, 7.5M free.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel



___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Requirements for successful mounting of RootImage?

2017-08-20 Thread Topi Miettinen
Sorry, your messages were in spam folder (must be due to some kind of
evil plan by the systemd haters), so I didn't notice them until now.

On 07/31/17 13:50, Lennart Poettering wrote:
> On So, 30.07.17 13:58, Topi Miettinen (toiwo...@gmail.com) wrote:
> 
>> Hey,
>>
>> I have this test.service unit:
>> [Unit]
>>
>> [Install]
>> WantedBy=multi-user.target
>>
>> [Service]
>> Type=oneshot
>> ExecStart=/bin/ls -lR
>> RootImage=/fs
>> MountAPIVFS=no
> 
> Any reason you turn this off? This is likely to break sooner or later,
> so it would make a ton of sense to test things first with it left on,
> before checking anything else.

OK, but that did not help.

>> The file /fs has a MBR partition table:
>> Disk /dev/loop0: 1.1 MiB, 1192960 bytes, 2330 sectors
>> Units: sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>> Disklabel type: dos
>> Disk identifier: 0x3990f3e6
>>
>> Device   Boot Start   End Sectors  Size Id Type
>> /dev/loop0p1 *   34  23292296  1.1M 83 Linux
> 
> That should work. See if "systemd-nspawn -i" can get a shell in it. If
> so, RootImage= should work too, it uses the same code.
> 
> Also, consider invoking /usr/lib/systemd/systemd-dissect on the image
> file, it will tell you whether it can make sense of the image, and how
> it would mount it.

# /lib/systemd/systemd-dissect /root.sqsh
Found writable 'root' partition of type squashfs without verity
(/dev/block/7:0)

>> Perhaps I miss some RootImage requirements? What exactly they are?
> 
> They are documented briefly in "systemd-nspawn's" --image= setting.

I tried systemd-nspawn with the image, but that also refuses. There's
this error:
# systemd-nspawn --image=/root.sqsh
Spawning container root.sqsh on /root.sqsh.
Press ^] three times within 1s to kill container.
Timezone Europe/Helsinki does not exist in container, not updating
container timezone.
Failed to create /var/log: Read-only file system

It looks like the image is mounted read-only:
2427  mkdir("/tmp/nspawn-root-jlYu4k/var/log", 0755) = -1 EROFS
(Read-only file system)

If I add "--tmpfs=/var" and move the mount_custom() call in nspawn.c
between setup_seccomp() and setup_timezone(), there's no error and
systemd-nspawn can mount the image and run the command. But it would be
nice to understand why the image is mounted read-only in the first place.

Adding a read-write /var to test.service does not help either:
BindPaths=/tmp/var.test:/var

The contents seem to be fine because there's no error when using nspawn
with --directory.

> That said, if systemd actually mounted something, then the image is
> fine. Most likely something is simply borked in the namespacing code,
> and that is kind hard to debug, because logging is already turned off
> at that point. It should be relatively easy to patch that in
> temporarily though, i.e. find apply_mount_namespace() in
> src/core/execute.c and place a log_open() before the setup_namespace()
> invocation, and check if this improves logging for you.

I'll try that next.

> 
> Lennart
> 

-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Requirements for successful mounting of RootImage?

2017-07-30 Thread Topi Miettinen
Hey,

I have this test.service unit:
[Unit]

[Install]
WantedBy=multi-user.target

[Service]
Type=oneshot
ExecStart=/bin/ls -lR
RootImage=/fs
MountAPIVFS=no

The file /fs has a MBR partition table:
Disk /dev/loop0: 1.1 MiB, 1192960 bytes, 2330 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x3990f3e6

Device   Boot Start   End Sectors  Size Id Type
/dev/loop0p1 *   34  23292296  1.1M 83 Linux

In the only partition a squashfs file system containing:
bin
bin/ls
boot
dev
etc
etc/group
etc/nsswitch.conf
etc/passwd
home
lib
lib64
lib64/ld-linux-x86-64.so.2
lib/libc.so.6
lib/libdl.so.2
lib/libpcre.so.3
lib/libpthread.so.0
lib/libselinux.so.1
media
opt
proc
root
run
sbin
srv
sys
tmp
usr
usr/bin
usr/lib
usr/lib64
usr/lib/x86_64-linux-gnu
usr/sbin
var
var/tmp

However, starting the service fails with abort:
Jul 30 13:25:42 machine systemd[1]: Starting test.service...
Jul 30 13:25:42 machine kernel:  loop0: p1
Jul 30 13:25:42 machine systemd[1]: test.service: Main process exited,
code=killed, status=6/ABRT
Jul 30 13:25:42 machine systemd[1]: Failed to start test.service.
Jul 30 13:25:42 machine systemd[1]: test.service: Unit entered failed state.
Jul 30 13:25:42 machine systemd[1]: test.service: Failed with result
'signal'.

It looks like systemd successfully mounts the file system, but then
suddenly aborts before even looking inside the file system:

2761  mount("/dev/loop1p1", "/run/systemd/unit-root", "squashfs",
MS_NODEV, NULL 
2761  <... mount resumed> ) = 0
2761  rt_sigprocmask(SIG_UNBLOCK, [ABRT],  
2761  <... rt_sigprocmask resumed> NULL, 8) = 0
2761  rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
2761  getpid( 
2761  <... getpid resumed> )= 2761
2761  gettid( 
2761  <... gettid resumed> )= 2761
2761  tgkill(2761, 2761, SIGABRT 
2761  <... tgkill resumed> )= 0
2761  rt_sigprocmask(SIG_SETMASK, [],  
2761  <... rt_sigprocmask resumed> NULL, 8) = 0
2761  --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2761,
si_uid=0} ---
2761  +++ killed by SIGABRT +++

The file system can be mounted by hand with losetup and mount, and
/bin/ls can be run from chroot. So I think everything should be OK but
RootImage still does not work and the error messages are useless.
Perhaps I miss some RootImage requirements? What exactly they are?

The same file system image without any partition table does not work either.

This is with Debian systemd version 234-2.

-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Any reason why /run and /dev/shm do not have MS_NOEXEC flags set?

2017-02-01 Thread Topi Miettinen
On 02/01/17 13:13, Hoyer, Marko (ADITG/SW2) wrote:
> Hi,
> 
> thanks to all for your fast feedback. I'll kick off an internal discussion 
> based on the facts you delivered to find out if our people actually want what 
> they want ;)

Filesystem W^X is a nice idea, but considering scripting or other (even
unintentional) Turing complete interpreters in a system, its not very
strong protection. See also
https://lwn.net/Articles/708196/

In my setup I have mounted /run with noexec, but /run/user/* still exec.
Then for each service you can enable systemd directive ProtectHome=true
which makes /run/user inaccessible.

Likewise for /dev/shm, you can check if it is needed by each service at
all and make it completely inaccessible if so, rather than making it
globally noexec.

-Topi

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] About http://0pointer.net/blog/avoiding-cve-2016-8655-with-systemd.html

2016-12-09 Thread Topi Miettinen
On 12/09/16 00:56, Michael Biebl wrote:
> Btw, I think we are lacking a good systemd sandboxing howto/tutorial.
> The one linked from fdo
> (http://0pointer.de/blog/projects/security.html) is pretty dated and
> the systemd.exec man page is not coherent enough with regards to
> security/sandboxing.
> 
> Related to that, I think it would be good if we would annotate in the
> man page, which sandboxing features work for user services and which
> don't. It's not always immediately obvious which feature requires root
> privileges.

Agreed. I started making a tool that helps with the systemd service unit
settings. It's not finished (is any software ever finished), but can
generate reasonable values from a representative test run of the service.

Please check out:
https://github.com/topimiettinen/systemd-settings-generator/blob/master/strace.stp

Earlier announcement:
https://lists.freedesktop.org/archives/systemd-devel/2016-August/037310.html

-Topi

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] deny access to GPU devices

2016-11-11 Thread Topi Miettinen
On 11/11/16 20:09, Lennart Poettering wrote:
> I have no idea what "slurm" is, but do note that the "devices" cgroup
> controller has no future, it is unlikely to ever become available in
> cgroupsv2.

This is unwelcome news, I think it is a simple and well contained MAC
that has been available in systems without a full blown MAC like SELinux
and with systemd support it has been very easy to set up. What will
happen to DevicePolicy, DeviceAllow etc. directives? Or will systemd
stick to cgroupsv1 forever?

> Device access to local users is normally managed through ACLs on the
> device node, via udev/logind's "uaccess" logic. Using the "devices"
> cgroup controller for this appears pretty misguided...

ACLs only limit access via the path that they are controlling, device
cgroup controlled the whole system. And if you have a MAC system that
can do that, it could perform the same task as ACLs but in a much better
way.

With cgroup you could also deny access to nodes that need to be
available for interactive users (like TTYs, audio, input devices, GPUs,
USB devices), but which are not useful for system services. Perhaps some
sort of ACL could be constructed with the same effect.

> Also, were does 195 come from? is that a hardcoded major of the
> closed-source nvidia driver? Yuck, code really shouldn't hardcode
> major/minor numbers these days... And sec

The reason seems to be that kernel devs chose not to expose the required
API to non-GPL modules, probably to give pressure to switch to GPL. As
that has not happened, the situation is not optimal for end user point
of view, but it's certainly within the devs' and module authors' rights
to continue using incompatible licences.

NVIDIA tackles this by shipping a SUID root helper nvidia-modprobe,
which is of course even worse from security point of view but it works.
This also highlights why having the device cgroup is a good idea, for
example the helper could be fooled to create new device nodes without
the ACLs. In my setup I have disabled the helper and the device nodes
are created with tmpfiles, which means I'm able to remove CAP_MKNOD
capability and any device cgroup 'm' rights from Xorg service running as
non-root user.

-Topi

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Automatic generation of a part of a service unit file

2016-08-07 Thread Topi Miettinen
Hello,

I made a small systemtap script which can generate a part of
configuration for a systemd service. When run, first it produces
strace-like output which is annotated with information gathered from
various kernel probes. When the process exits, a summary is generated in
systemd unit format. The purpose of the script is to help system
administrators, distro maintainers and program developers to prepare
better unit files.

The systemtap probes check for:
 * capability use for CapabilityBoundingSet
 * device access for DeviceAllow
 * address family use for RestrictAddressFamilies
 * RLIMIT related information

Also file system accesses are checked against ProtectSystem/ProtectHome
requirements and mmap()/mprotect() flags against what is needed by
MemoryDenyWriteExecute=true. For example, if the process never writes to
/boot, /etc or /usr, we can set ProtectSystem=full, but if the script
detects a write access, this is degraded to ProtectSystem=true or
ProtectSystem=false. For InaccessibleDirectories, the user should
specify the list of candidate paths which can be made inaccessible. If
there's any FS access to those paths, they are dropped from the list,
otherwise the remaining paths are proposed as inaccessible.

A list of system calls is produced for SystemCallFilter.

When in doubt, the strace part can be used to verify how the summary was
produced.

Short sample output with wpa_supplicant (looks better on a wide terminal
without line wrapping):
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0) = 4
[RestrictAddressFamilies=AF_UNIX] [NOFILE 3 -> 4]
socket(PF_NETLINK, SOCK_RAW, 0) = 5 [RestrictAddressFamilies=AF_NETLINK]
connect(4, {AF_UNIX, "/var/run/dbus/system_bus_socket"}, 33) = 0
[ReadWriteDirectories=/run/dbus/system_bus_socket ]
[InaccessibleDirectories=~/ /run /run/dbus /run/dbus/system_bus_socket
/var ]
open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 14
[DeviceAllow=/dev/char/1:9 r ] [NOFILE 13 -> 14]
[InaccessibleDirectories=~/ /dev /dev/urandom ]

Summary:
CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_RAW
# Consider also possibly missing CapabilityBoundingSet=CAP_SYS_ADMIN
ProtectHome=true
ProtectSystem=full
DevicePolicy=strict
DeviceAllow=/dev/char/1:3 rw
DeviceAllow=/dev/char/1:8 r
DeviceAllow=/dev/char/1:9 r
DeviceAllow=/dev/char/10:58 r
# LimitFSIZE=0
# LimitDATA=577536
# LimitSTACK=139264
# LimitCORE=0
# LimitNOFILE=15
# LimitAS=45146112
# LimitNPROC=159
# LimitMEMLOCK=0
# LimitSIGPENDING=0
# LimitMSGQUEUE=0
# LimitNICE=0
# LimitRTPRIO=0
RestrictAddressFamilies=AF_UNIX AF_INET AF_NETLINK AF_PACKET
MemoryDenyWriteExecute=true
SystemCallFilter=access alarm arch_prctl bind brk chmod clock_getres
clock_gettime close connect execve exit_group fcntl fstat geteuid
getresgid getresuid getrlimit getsockname getuid ioctl mkdir mmap
mprotect munmap open poll read recvfrom recvmsg rmdir rt_sigaction
rt_sigprocmask rt_sigreturn rt_sigsuspend select sendmsg sendto
set_robust_list set_tid_address setsockopt socket statfs unlink write
InaccessibleDirectories=-/bin
InaccessibleDirectories=-/boot
InaccessibleDirectories=-/dev/hugepages
InaccessibleDirectories=-/dev/mqueue
InaccessibleDirectories=-/dev/pts
InaccessibleDirectories=-/dev/shm
InaccessibleDirectories=-/home
InaccessibleDirectories=-/lost+found
InaccessibleDirectories=-/media
InaccessibleDirectories=-/mnt
InaccessibleDirectories=-/opt
InaccessibleDirectories=-/proc/bus
InaccessibleDirectories=-/proc/sys
InaccessibleDirectories=-/root
InaccessibleDirectories=-/srv
InaccessibleDirectories=-/tmp
InaccessibleDirectories=-/usr/bin
InaccessibleDirectories=-/usr/sbin
InaccessibleDirectories=-/var/tmp
ReadOnlyDirectories=/
ReadWriteDirectories=/dev/null /run /run/dbus/system_bus_socket
/run/wpa_supplicant /run/wpa_supplicant/wlan0 socket:[38833]

This is pretty much valid (with some editing, for example last line only
really needs /run/wpa_supplicant) for my system, even if the script is
not yet perfect. I do not trust the RLIMIT values yet and systemtap
itself causes problems (needs to be run as root, system call names don't
match seccomp). Perhaps there should be a way for systemd to start the
process directly without staprun in between and then tell staprun about
the process.

I suppose the script could find a home either with systemd repository
(as it's fairly specific to systemd), systemtap (it's just another
script) or just somewhere in github if nobody cares. Would it be
interesting for systemd?

For future features, it may be possible to probe what kind of settings
for NoNewPrivileges or SecureBits could be used. This could need small
changes to kernel. PrivateTmp and PrivateNetwork may be possible to be
generated in some cases, MountFlags probably not.

-Topi
#! /bin/sh

# suppress some run-time errors here for cleaner output
//bin/true && exec stap --suppress-handler-errors --skip-badvars $0 ${1+"$@"}

/*
 * Compile:
 * stap -p4 -DSTP_NO_OVERLOAD -m strace
 * Run:
 * /usr/bin/staprun -R -c "/sbin/wpa_supplicant -u -O /run/wpa_supplicant -c 

[systemd-devel] Easier alternative to SystemCallFilter

2016-04-16 Thread Topi Miettinen
Hello,

SystemCallFilter, while a nice feature, is not easy to use because there
are hundreds of system calls to be managed.

I'm proposing to add a simpler way to prepare seccomp filters (to
complement SystemCallFilter), where the user can construct the filter by
using predefined system call groups or sets.

Looking at the system calls, they can be arranged in roughly following
groups/sets:

SysAdmin: mknod, [u]mount, pivot_root, init_module, set*id, reboot and
so forth, all system calls where superuser capabilities are always needed.

NetworkIO: socket, bind, connect, sendto, listen, accept etc.

FileIO: most of the calls, open, read, ioctl etc.

IPC: mq_*, msg*, sem*, shm*

Exec (poor name): clone, execve, [v]fork, unshare

For finer tuning, the sets could be subdivided with subgroups like
NetworkIOSend (only sendto etc.), NetworkIOReceive and NetworkIOGeneral
(for the rest), so that NetworkIO = NetworkIOGeneral + NetworkIOSend +
NetworkIOReceive. The possibilities to subdivide are of course endless,
but as the overall goal is to make user's life simpler, it would not
make sense to introduce hundreds of groups because then we'd be back to
where we started.

An example of the syntax would be:

Blacklisting version:
SystemCallFilterSet=~SysAdmin NetworkIOWrite

The same as whitelist:
SystemCallFilterSet=FileIO IPC Exec NetworkGeneral NetworkIOReceve

SystemCallFilter lines would then modify the filters created by the
SystemCallFilterSet instead of starting from scratch.

Alternatively SystemCallFilter syntax could be enhanced with the sets.
But then an old (downgraded) systemd would not understand the new syntax
and it would reject the entire line, which would remove all filtering.

Maybe this feature could be updated when kernel (or more likely seccomp)
introduces new syscalls without changing the syntax. For example, if a
new privileged system call blinkenlights is added by kernel 7.3, it
could be added to SysAdmin set. The downside is that the filter would
expand when whitelisting and it would not be exactly same for different
kernel or systemd versions.

-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] units: add SecureBits

2015-04-24 Thread Topi Miettinen
On 04/24/15 14:52, Lennart Poettering wrote:
 On Sat, 14.02.15 12:32, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 Sorry for the late response, still going through piles of mail.
 
 No setuid programs are expected to be executed, so add
 SecureBits=no-setuid-fixup no-setuid-fixup-locked
 to unit files.

 So, hmm, after reading the man page again: what's the rationale for
 precisely these bits?

 I mean no-setuid-fixup seems to be something that applies to setuid(),
 setresuid() calls and suchlike, which seems pretty uninteresting. Much
 more interesting is SECBIT_NOROOT, which disables suid binary
 handling...

 Yes, noroot noroot-locked was actually my intention, sorry. I'll update
 the patch.

 Maybe all of noroot noroot-locked no-setuid-fixup
 no-setuid-fixup-locked would be OK, but that probably needs another
 look at the programs if they switch UIDs.

 I'd be careful with more than noroot, since the other flags alter
 bbehaviour across setuid() and similar calls, and much of our code
 makes assumptions that will likely not hold if you set those bits...

 Going back to no-setuid-fixup no-setuid-fixup-locked: First, I looked at
 the kernel code if it matches the description on the manual page
 capabilities(7) to prevent more embarrassment. In this case it does,
 NO_SETUID_FIXUP prevents capability changes when the task calls
 set[res]*uid().
 
 Any kind of changes? Both dropping and acquiring them? I mean, I think
 we should actually allow dropping them unless we explicitly say no
 during one transition.

Both ways IIRC.

 I have the suspicion that the SECBIT_NOROOT thing is the only
 really interesting one...

I think they are all pretty much useless. SECURE_NOROOT could be
improved by splitting it to two bits, one controlling setuid execution
and the other to control whether capabilities are dropped at exec(). At
least the manual page should be in synch with reality.

Perhaps MAC systems would be better to enforce capability limits
throughout the whole system. Unfortunately my favourite, TOMOYO, does
not manage capabilities.

 There's of course the question whether no-setuid-fixup
 no-setuid-fixup-locked is useful. For daemons runnig as root, it would
 not help anything (or could even make things worse e.g. in the library
 case). But when the daemon runs as a different user, the flags could
 make the life of attacker a tiny bit more difficult. This leaves only:
 systemd-journal-gatewayd.service
 systemd-journal-remote.service
 systemd-journal-upload.service

 I can make a patch for those if you agree, or the original patch can be
 applied selectively.

 Maybe more daemons should run as unprivileged user.
 
 I think all long-running ones that reasonably can already do. I mean,
 things like logind simple need too many caps, it's really not worth
 trying to make them run under a different uid, because they have so
 much privs otherwise...

 Which daemons do you precisely have in mind?

Nothing in particular. Privilege separation could help even in cases
where some caps need to be retained.

-Topi

 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH/RFC] FuseMAC: user space MAC in systemd

2015-03-09 Thread Topi Miettinen
On 03/08/15 23:16, Lennart Poettering wrote:
 On Mon, 02.03.15 22:49, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 Intercept and filter filesystem operations of processes launched
 by systemd with FUSE.

 Implement learning, enforcing and auto enforcing/learning modes,
 enabled with new exec directive FuseMAC.

 FS operations can be filtered by access type (e.g. getattr/read,
 cf. AppArmor or TOMOYO Linux) or for more fine grained control,
 which area of the file is being accessed.

 Due to limitations of FUSE, API file systems can't be intercepted.

 Also the patch seems to trigger bugs in kernel (hang CPU).
 
 Hmm, if I understand this patch right, then you proxy all file system
 access through a userspace fuse tool to enforce additional access
 restrictions?

All except file systems like /dev, /proc, /sys etc.

 Well, I am pretty sure that systemd should not be in the business of
 implementing a new access control mechanism. It's fine to expose ones
 that are supported in the kernel in ways, or even using things like
 namespacing to implement access control, but it really shouldn't be
 systemd that is the one enforcing file access rights here, I am very
 sure. It might be the place to encode and configure policy, but not
 the place to enforce it.

OK. It may be better design to keep it separate anyway, for example library.

 I think this is better done outside of systemd, and quite frankly, in
 the kernel, already for performance reasons.

Some of the access control is already performed by kernel (due to flag
default_permissions), I'm only adding on top of that. Performance isn't
so bad either.

Not doing the access control in kernel is actually the strength here. I
don't think it would be reasonable to expect kernel to support fine
grained byte range type of access control or automatic mode. From user
space it's possible to perform very complex checks, involve external
state and even query the user.

For example, one of the nice ideas in TOMOYO Linux is the ability to
base rules on the ancestry of a process, like kernel /sbin/init
/usr/bin/dbus-daemon is different from kernel /sbin/init
/usr/bin/kdm /usr/bin/dbus-daemon. This would be pretty easy to
implement to FuseMAC. I'm not sure this would be possible with SELinux
and it's a weakness in AppArmor.

Byte ranges let FuseMAC differentiate between, for example, 'bash -c
exit' and 'bash -c echo foo;exit', which is far beyond what any current
MAC can do.

The kernel could be a bit more helpful, now it's not possible to know if
reading a file happens due to mmap() or read(). Maybe this information
could be delivered by FUSE as well, or we could ptrace() the process.
Then FuseMAC would match AppArmor's 'm' (mmap) flag.

 Sorry!

NP.

-Topi

 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] refactored Re: [PATCH] nspawn: Map all seccomp filters to matching capabilities

2015-03-03 Thread Topi Miettinen
On 03/03/15 01:28, Jay Faulkner wrote:
 Hey,
 
 Lennart reviewed this in IRC and suggested I refactor the change in this
 manner. Now, we have an array of capability:sys call pairs, and iterate
 through that and then only add the seccomp filter if the capability
 doesn’t exist.
 
 The new patch is attached, and available
 here: https://github.com/jayofdoom/systemd/pull/5.patch. 

+typedef struct CapSeccompPair {
+uint64_t capability;
+int scmp_syscall_num;
+} CapSeccompPair;
...
+static const CapSeccompPair blacklist[] = {
+{ SCMP_SYS(iopl), CAP_SYS_RAWIO },

The fields are swapped.

-Topi

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH/RFC] FuseMAC: user space MAC in systemd

2015-03-02 Thread Topi Miettinen
(context-fusemac_mode, params-unit_id);
+if (r  0) {
+*exit_status = EXIT_FUSE;
 return r;
 }
 }
+#endif
 
 if (params-apply_chroot) {
 if (context-root_directory)
@@ -2460,6 +2470,9 @@ void exec_context_dump(ExecContext *c, FILE* f, const 
char *prefix) {
 fprintf(f,
 %sAppArmorProfile: %s%s\n,
 prefix, c-apparmor_profile_ignore ? - : , 
c-apparmor_profile);
+fprintf(f,
+%sFuseMAC: %s\n,
+prefix, strna(fusemac_mode_to_string(c-fusemac_mode)));
 }
 
 bool exec_context_maintains_privileges(ExecContext *c) {
@@ -2911,3 +2924,12 @@ static const char* const 
exec_output_table[_EXEC_OUTPUT_MAX] = {
 };
 
 DEFINE_STRING_TABLE_LOOKUP(exec_output, ExecOutput);
+
+static const char *const fusemac_mode_table[_FUSEMAC_MAX] = {
+[FUSEMAC_NO] = no,
+[FUSEMAC_LEARN] = learn,
+[FUSEMAC_ENFORCE] = enforce,
+[FUSEMAC_AUTO] = auto,
+};
+
+DEFINE_STRING_TABLE_LOOKUP(fusemac_mode, FuseMAC);
diff --git a/src/core/execute.h b/src/core/execute.h
index 1a43ac7..68280fd 100644
--- a/src/core/execute.h
+++ b/src/core/execute.h
@@ -63,6 +63,15 @@ typedef enum ExecOutput {
 _EXEC_OUTPUT_INVALID = -1
 } ExecOutput;
 
+typedef enum FuseMAC {
+FUSEMAC_NO,
+FUSEMAC_LEARN,
+FUSEMAC_ENFORCE,
+FUSEMAC_AUTO,
+_FUSEMAC_MAX,
+_FUSEMAC_INVALID = -1
+} FuseMAC;
+
 struct ExecStatus {
 dual_timestamp start_timestamp;
 dual_timestamp exit_timestamp;
@@ -191,6 +200,8 @@ struct ExecContext {
 
 /* custom dbus enpoint */
 BusEndpoint *bus_endpoint;
+
+FuseMAC fusemac_mode;
 };
 
 #include cgroup.h
@@ -265,3 +276,8 @@ ExecOutput exec_output_from_string(const char *s) _pure_;
 
 const char* exec_input_to_string(ExecInput i) _const_;
 ExecInput exec_input_from_string(const char *s) _pure_;
+
+const char* fusemac_mode_to_string(FuseMAC p) _const_;
+FuseMAC fusemac_mode_from_string(const char *s) _pure_;
+
+int setup_fusemac(FuseMAC fusemac_mode, const char *unit_id);
diff --git a/src/core/fusemac.c b/src/core/fusemac.c
new file mode 100644
index 000..e712cdb
--- /dev/null
+++ b/src/core/fusemac.c
@@ -0,0 +1,1118 @@
+/*-*- Mode: C; c-basic-offset: 8; indent-tabs-mode: nil -*-*/
+
+/*
+  Uses some code from fusexmp.c from FUSE:
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi mik...@szeredi.hu
+  Copyright (C) 2011   Sebastian Pipping sebast...@pipping.org
+
+  This program can be distributed under the terms of the GNU GPL.
+  See the file COPYING.
+
+  gcc -Wall fusemac.c `pkg-config fuse --cflags --libs` -o fusemac -DTEST
+*/
+
+/***
+  FuseMAC: FUSE-based Mandatory Access Control
+  Copyright (C) 2015   Topi Miettinen toiwo...@gmail.com
+  FuseMAC modes:
+  Learning: add any operations to permitted set
+  Enforcing: if the operation is not in permitted set, return -EPERM
+  Auto: like enforcing, but if the file has changed, update permitted set for 
that file
+***/
+
+#define FUSE_USE_VERSION 26
+#include assert.h
+#include stdbool.h
+#include sys/types.h
+#include fuse.h
+#include stdlib.h
+#include stdio.h
+#include string.h
+#include unistd.h
+#include fcntl.h
+#include sys/stat.h
+#include dirent.h
+#include errno.h
+#include sys/time.h
+#include sys/xattr.h
+#include sys/mount.h
+#include list.h
+#include mkdir.h
+#ifndef TEST
+#include execute.h
+#include util.h
+#include log.h
+#else
+typedef enum FuseMAC {
+FUSEMAC_NO,
+FUSEMAC_LEARN,
+FUSEMAC_ENFORCE,
+FUSEMAC_AUTO,
+_FUSEMAC_MAX,
+_FUSEMAC_INVALID = -1
+} FuseMAC;
+#define log_debug(...) fprintf(stderr, __VA_ARGS__)
+#define new0(t, n) ((t*) calloc((n), sizeof(t)))
+#endif
+
+enum {
+p_getattr = 0,
+p_access,
+p_readlink,
+p_readdir,
+p_mknod,
+p_mkdir,
+p_symlink,
+p_unlink,
+p_rmdir,
+p_rename,
+p_link,
+p_chmod,
+p_chown,
+p_truncate,
+p_utimens,
+p_open,
+p_read,
+p_write,
+p_statfs,
+p_release,
+p_fsync,
+p_fallocate,
+p_setxattr,
+p_getxattr,
+p_listxattr,
+p_removexattr,
+p_lastop
+};
+
+static const char * const op_names[] = {
+[p_getattr] = getattr,
+[p_access] = access,
+[p_readlink] = readlink,
+[p_readdir] = readdir,
+[p_mknod] = mknod,
+[p_mkdir] = mkdir,
+[p_symlink] = symlink,
+[p_unlink] = unlink,
+[p_rmdir] = rmdir,
+[p_rename] = rename,
+[p_link] = link,
+[p_chmod] = chmod,
+[p_chown] = chown,
+[p_truncate] = truncate,
+[p_utimens] = utimens,
+[p_open] = open,
+[p_read] = read,
+[p_write] = write

Re: [systemd-devel] [PATCH] units: add SecureBits

2015-02-14 Thread Topi Miettinen
On 02/11/15 16:32, Lennart Poettering wrote:
 On Wed, 11.02.15 16:24, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 On 02/10/15 21:00, Lennart Poettering wrote:
 On Sat, 07.02.15 10:40, Topi Miettinen (toiwo...@gmail.com) wrote:

 No setuid programs are expected to be executed, so add
 SecureBits=no-setuid-fixup no-setuid-fixup-locked
 to unit files.

 So, hmm, after reading the man page again: what's the rationale for
 precisely these bits?

 I mean no-setuid-fixup seems to be something that applies to setuid(),
 setresuid() calls and suchlike, which seems pretty uninteresting. Much
 more interesting is SECBIT_NOROOT, which disables suid binary
 handling...

 Yes, noroot noroot-locked was actually my intention, sorry. I'll update
 the patch.

 Maybe all of noroot noroot-locked no-setuid-fixup
 no-setuid-fixup-locked would be OK, but that probably needs another
 look at the programs if they switch UIDs.
 
 I'd be careful with more than noroot, since the other flags alter
 bbehaviour across setuid() and similar calls, and much of our code
 makes assumptions that will likely not hold if you set those bits...

Going back to no-setuid-fixup no-setuid-fixup-locked: First, I looked at
the kernel code if it matches the description on the manual page
capabilities(7) to prevent more embarrassment. In this case it does,
NO_SETUID_FIXUP prevents capability changes when the task calls
set[res]*uid().

Then I looked at all use cases of set[res]*uid() in systemd. They are
called directly by run and nspawn. Then they are used when connecting to
journal and setting up PAM in exec_spawn() as well as in
namespace_enter(). These in turn are used by sd-bus init which is used
by unit setup, so pretty much everything is affected. The good thing is
that none of the daemons except machined use these directly.
drop_privileges() uses set[res]*uid() and it is called from resolved,
coredump, networkd, timesyncd and bus-proxyd.

So avoiding those, at least the following could be candidates:
systemd-hostnamed.service
systemd-importd.service
systemd-journal-gatewayd.service
systemd-journal-remote.service
systemd-journal-upload.service
systemd-journald.service
systemd-localed.service
systemd-logind.service
systemd-timedated.service

It could still be possible that an external library could detect that
uid==0 and then switch uids to do things at lower privilege. This would
work, but the capabilities would not be dropped.

There's of course the question whether no-setuid-fixup
no-setuid-fixup-locked is useful. For daemons runnig as root, it would
not help anything (or could even make things worse e.g. in the library
case). But when the daemon runs as a different user, the flags could
make the life of attacker a tiny bit more difficult. This leaves only:
systemd-journal-gatewayd.service
systemd-journal-remote.service
systemd-journal-upload.service

I can make a patch for those if you agree, or the original patch can be
applied selectively.

Maybe more daemons should run as unprivileged user.

-Topi

 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] units: add SecureBits

2015-02-11 Thread Topi Miettinen
On 02/10/15 21:00, Lennart Poettering wrote:
 On Sat, 07.02.15 10:40, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 No setuid programs are expected to be executed, so add
 SecureBits=no-setuid-fixup no-setuid-fixup-locked
 to unit files.
 
 So, hmm, after reading the man page again: what's the rationale for
 precisely these bits?
 
 I mean no-setuid-fixup seems to be something that applies to setuid(),
 setresuid() calls and suchlike, which seems pretty uninteresting. Much
 more interesting is SECBIT_NOROOT, which disables suid binary
 handling...

Yes, noroot noroot-locked was actually my intention, sorry. I'll update
the patch.

Maybe all of noroot noroot-locked no-setuid-fixup
no-setuid-fixup-locked would be OK, but that probably needs another
look at the programs if they switch UIDs.

-Topi

 
 Can you elaborate?
 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH v2] units: add SecureBits

2015-02-11 Thread Topi Miettinen
No setuid programs are expected to be executed, so add
SecureBits=noroot noroot-locked
to unit files.
---
 units/systemd-hostnamed.service.in| 1 +
 units/systemd-importd.service.in  | 1 +
 units/systemd-journal-gatewayd.service.in | 1 +
 units/systemd-journal-remote.service.in   | 1 +
 units/systemd-journal-upload.service.in   | 1 +
 units/systemd-journald.service.in | 1 +
 units/systemd-localed.service.in  | 1 +
 units/systemd-logind.service.in   | 1 +
 units/systemd-machined.service.in | 1 +
 units/systemd-networkd.service.in | 1 +
 units/systemd-resolved.service.in | 1 +
 units/systemd-timedated.service.in| 1 +
 units/systemd-timesyncd.service.in| 1 +
 13 files changed, 13 insertions(+)

diff --git a/units/systemd-hostnamed.service.in 
b/units/systemd-hostnamed.service.in
index cc88ecd..259b451 100644
--- a/units/systemd-hostnamed.service.in
+++ b/units/systemd-hostnamed.service.in
@@ -14,6 +14,7 @@ 
Documentation=http://www.freedesktop.org/wiki/Software/systemd/hostnamed
 ExecStart=@rootlibexecdir@/systemd-hostnamed
 BusName=org.freedesktop.hostname1
 CapabilityBoundingSet=CAP_SYS_ADMIN
+SecureBits=noroot noroot-locked
 WatchdogSec=1min
 PrivateTmp=yes
 PrivateDevices=yes
diff --git a/units/systemd-importd.service.in b/units/systemd-importd.service.in
index 26759ea..189c763 100644
--- a/units/systemd-importd.service.in
+++ b/units/systemd-importd.service.in
@@ -14,6 +14,7 @@ ExecStart=@rootlibexecdir@/systemd-importd
 BusName=org.freedesktop.import1
 CapabilityBoundingSet=CAP_CHOWN CAP_FOWNER CAP_FSETID CAP_MKNOD CAP_SETFCAP 
CAP_SYS_ADMIN CAP_SETPCAP
 NoNewPrivileges=yes
+SecureBits=noroot noroot-locked
 WatchdogSec=1min
 PrivateTmp=yes
 ProtectSystem=full
diff --git a/units/systemd-journal-gatewayd.service.in 
b/units/systemd-journal-gatewayd.service.in
index 987220e..f15a37f 100644
--- a/units/systemd-journal-gatewayd.service.in
+++ b/units/systemd-journal-gatewayd.service.in
@@ -11,6 +11,7 @@ Requires=systemd-journal-gatewayd.socket
 
 [Service]
 ExecStart=@rootlibexecdir@/systemd-journal-gatewayd
+SecureBits=noroot noroot-locked
 User=systemd-journal-gateway
 Group=systemd-journal-gateway
 SupplementaryGroups=systemd-journal
diff --git a/units/systemd-journal-remote.service.in 
b/units/systemd-journal-remote.service.in
index 4a898d6..afa35e6 100644
--- a/units/systemd-journal-remote.service.in
+++ b/units/systemd-journal-remote.service.in
@@ -13,6 +13,7 @@ Requires=systemd-journal-remote.socket
 ExecStart=@rootlibexecdir@/systemd-journal-remote \
   --listen-https=-3 \
   --output=/var/log/journal/remote/
+SecureBits=noroot noroot-locked
 User=systemd-journal-remote
 Group=systemd-journal-remote
 PrivateTmp=yes
diff --git a/units/systemd-journal-upload.service.in 
b/units/systemd-journal-upload.service.in
index b2e3c76..f8524ca 100644
--- a/units/systemd-journal-upload.service.in
+++ b/units/systemd-journal-upload.service.in
@@ -12,6 +12,7 @@ After=network.target
 [Service]
 ExecStart=@rootlibexecdir@/systemd-journal-upload \
   --save-state
+SecureBits=noroot noroot-locked
 User=systemd-journal-upload
 PrivateTmp=yes
 PrivateDevices=yes
diff --git a/units/systemd-journald.service.in 
b/units/systemd-journald.service.in
index a3540c6..b48e4ad 100644
--- a/units/systemd-journald.service.in
+++ b/units/systemd-journald.service.in
@@ -22,6 +22,7 @@ RestartSec=0
 NotifyAccess=all
 StandardOutput=null
 CapabilityBoundingSet=CAP_SYS_ADMIN CAP_DAC_OVERRIDE CAP_SYS_PTRACE CAP_SYSLOG 
CAP_AUDIT_CONTROL CAP_AUDIT_READ CAP_CHOWN CAP_DAC_READ_SEARCH CAP_FOWNER 
CAP_SETUID CAP_SETGID CAP_MAC_OVERRIDE
+SecureBits=noroot noroot-locked
 WatchdogSec=1min
 FileDescriptorStoreMax=1024
 
diff --git a/units/systemd-localed.service.in b/units/systemd-localed.service.in
index bfa0978..d2fbf30 100644
--- a/units/systemd-localed.service.in
+++ b/units/systemd-localed.service.in
@@ -14,6 +14,7 @@ 
Documentation=http://www.freedesktop.org/wiki/Software/systemd/localed
 ExecStart=@rootlibexecdir@/systemd-localed
 BusName=org.freedesktop.locale1
 CapabilityBoundingSet=
+SecureBits=noroot noroot-locked
 WatchdogSec=1min
 PrivateTmp=yes
 PrivateDevices=yes
diff --git a/units/systemd-logind.service.in b/units/systemd-logind.service.in
index f087e99..471278a 100644
--- a/units/systemd-logind.service.in
+++ b/units/systemd-logind.service.in
@@ -24,6 +24,7 @@ Restart=always
 RestartSec=0
 BusName=org.freedesktop.login1
 CapabilityBoundingSet=CAP_SYS_ADMIN CAP_MAC_ADMIN CAP_AUDIT_CONTROL CAP_CHOWN 
CAP_KILL CAP_DAC_READ_SEARCH CAP_DAC_OVERRIDE CAP_FOWNER CAP_SYS_TTY_CONFIG
+SecureBits=noroot noroot-locked
 WatchdogSec=1min
 
 # Increase the default a bit in order to allow many simultaneous
diff --git a/units/systemd-machined.service.in 
b/units/systemd-machined.service.in
index 15f34d9..0cb823e 100644
--- a/units/systemd-machined.service.in
+++ b/units/systemd-machined.service.in
@@ -16,6 +16,7 @@ After=machine.slice
 

[systemd-devel] [PATCH] units: add SecureBits

2015-02-07 Thread Topi Miettinen
No setuid programs are expected to be executed, so add
SecureBits=no-setuid-fixup no-setuid-fixup-locked
to unit files.
---
 units/systemd-hostnamed.service.in| 1 +
 units/systemd-importd.service.in  | 1 +
 units/systemd-journal-gatewayd.service.in | 1 +
 units/systemd-journal-remote.service.in   | 1 +
 units/systemd-journal-upload.service.in   | 1 +
 units/systemd-journald.service.in | 1 +
 units/systemd-localed.service.in  | 1 +
 units/systemd-logind.service.in   | 1 +
 units/systemd-machined.service.in | 1 +
 units/systemd-networkd.service.in | 1 +
 units/systemd-resolved.service.in | 1 +
 units/systemd-timedated.service.in| 1 +
 units/systemd-timesyncd.service.in| 1 +
 13 files changed, 13 insertions(+)

diff --git a/units/systemd-hostnamed.service.in 
b/units/systemd-hostnamed.service.in
index cc88ecd..ec13938 100644
--- a/units/systemd-hostnamed.service.in
+++ b/units/systemd-hostnamed.service.in
@@ -14,6 +14,7 @@ 
Documentation=http://www.freedesktop.org/wiki/Software/systemd/hostnamed
 ExecStart=@rootlibexecdir@/systemd-hostnamed
 BusName=org.freedesktop.hostname1
 CapabilityBoundingSet=CAP_SYS_ADMIN
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 WatchdogSec=1min
 PrivateTmp=yes
 PrivateDevices=yes
diff --git a/units/systemd-importd.service.in b/units/systemd-importd.service.in
index 26759ea..bb3fbea 100644
--- a/units/systemd-importd.service.in
+++ b/units/systemd-importd.service.in
@@ -14,6 +14,7 @@ ExecStart=@rootlibexecdir@/systemd-importd
 BusName=org.freedesktop.import1
 CapabilityBoundingSet=CAP_CHOWN CAP_FOWNER CAP_FSETID CAP_MKNOD CAP_SETFCAP 
CAP_SYS_ADMIN CAP_SETPCAP
 NoNewPrivileges=yes
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 WatchdogSec=1min
 PrivateTmp=yes
 ProtectSystem=full
diff --git a/units/systemd-journal-gatewayd.service.in 
b/units/systemd-journal-gatewayd.service.in
index 987220e..bfdb561 100644
--- a/units/systemd-journal-gatewayd.service.in
+++ b/units/systemd-journal-gatewayd.service.in
@@ -11,6 +11,7 @@ Requires=systemd-journal-gatewayd.socket
 
 [Service]
 ExecStart=@rootlibexecdir@/systemd-journal-gatewayd
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 User=systemd-journal-gateway
 Group=systemd-journal-gateway
 SupplementaryGroups=systemd-journal
diff --git a/units/systemd-journal-remote.service.in 
b/units/systemd-journal-remote.service.in
index 4a898d6..4f25518 100644
--- a/units/systemd-journal-remote.service.in
+++ b/units/systemd-journal-remote.service.in
@@ -13,6 +13,7 @@ Requires=systemd-journal-remote.socket
 ExecStart=@rootlibexecdir@/systemd-journal-remote \
   --listen-https=-3 \
   --output=/var/log/journal/remote/
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 User=systemd-journal-remote
 Group=systemd-journal-remote
 PrivateTmp=yes
diff --git a/units/systemd-journal-upload.service.in 
b/units/systemd-journal-upload.service.in
index b2e3c76..ac776ac 100644
--- a/units/systemd-journal-upload.service.in
+++ b/units/systemd-journal-upload.service.in
@@ -12,6 +12,7 @@ After=network.target
 [Service]
 ExecStart=@rootlibexecdir@/systemd-journal-upload \
   --save-state
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 User=systemd-journal-upload
 PrivateTmp=yes
 PrivateDevices=yes
diff --git a/units/systemd-journald.service.in 
b/units/systemd-journald.service.in
index a3540c6..01bf2a7 100644
--- a/units/systemd-journald.service.in
+++ b/units/systemd-journald.service.in
@@ -22,6 +22,7 @@ RestartSec=0
 NotifyAccess=all
 StandardOutput=null
 CapabilityBoundingSet=CAP_SYS_ADMIN CAP_DAC_OVERRIDE CAP_SYS_PTRACE CAP_SYSLOG 
CAP_AUDIT_CONTROL CAP_AUDIT_READ CAP_CHOWN CAP_DAC_READ_SEARCH CAP_FOWNER 
CAP_SETUID CAP_SETGID CAP_MAC_OVERRIDE
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 WatchdogSec=1min
 FileDescriptorStoreMax=1024
 
diff --git a/units/systemd-localed.service.in b/units/systemd-localed.service.in
index bfa0978..f0c06aa 100644
--- a/units/systemd-localed.service.in
+++ b/units/systemd-localed.service.in
@@ -14,6 +14,7 @@ 
Documentation=http://www.freedesktop.org/wiki/Software/systemd/localed
 ExecStart=@rootlibexecdir@/systemd-localed
 BusName=org.freedesktop.locale1
 CapabilityBoundingSet=
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 WatchdogSec=1min
 PrivateTmp=yes
 PrivateDevices=yes
diff --git a/units/systemd-logind.service.in b/units/systemd-logind.service.in
index f087e99..f6760c6 100644
--- a/units/systemd-logind.service.in
+++ b/units/systemd-logind.service.in
@@ -24,6 +24,7 @@ Restart=always
 RestartSec=0
 BusName=org.freedesktop.login1
 CapabilityBoundingSet=CAP_SYS_ADMIN CAP_MAC_ADMIN CAP_AUDIT_CONTROL CAP_CHOWN 
CAP_KILL CAP_DAC_READ_SEARCH CAP_DAC_OVERRIDE CAP_FOWNER CAP_SYS_TTY_CONFIG
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 WatchdogSec=1min
 
 # Increase the default a bit in order to allow many simultaneous
diff --git a/units/systemd-machined.service.in 

Re: [systemd-devel] [PATCH] timesyncd: tighten unit file

2015-02-01 Thread Topi Miettinen
On 01/27/15 22:19, Lennart Poettering wrote:
 On Tue, 27.01.15 21:58, Topi Miettinen (toiwo...@gmail.com) wrote:
 Well, the way I read the paragraph above if we set PR_SET_NO_NEW_PRIVS
 after forking, but before execing systemd-timesyncd, and that binary
 has an selinux label on it, that the selinux label would be ignore,
 and things would continue to run with the same label as we forked
 off. This pretty much renders SELinux usesless, since all services
 where we set the bit for would then run as init_t... and that's
 something we really shouldn't do.

 seccomp_filter.txt on the other hand says that
 Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
 run with CAP_SYS_ADMIN privileges in its namespace.  If these are not
 true, -EACCES will be returned.  This requirement ensures that filter
 programs cannot be applied to child processes with greater privileges
 than the task that installed them.

 Does this mean that SELinux and seccomp are mutually exclusive? Awful
 design if so.
 
 Well, no it doesn't mean that. If systemd sets up a seccomp filter it
 has CAP_SYS_ADMIN, hence all is good. And it can leave
 PR_SET_NO_NEW_PRIVS off, and thus not break selinux.

So in conclusion, only the SecureBits part would be interesting. Looking
at the unit files, which of them expect to run setuid helpers? I looked
at the unit files and briefly at the corresponding programs (also to get
to know them better), so I offer my guesses below:

units/console-getty.service.m4.in

getty launches user shell, so no.

units/console-shell.service.m4.in

sulogin launches root shell, so no.

units/container-ge...@.service.m4.in

getty launches user shell, so no.

units/debug-shell.service.in

sushell launches root shell, so no.

units/emergency.service.in

sulogin launches root shell, so no.

units/getty@.service.m4

getty launches user shell, so no.

units/halt-local.service.in

Local scripts could launch anything, so no.

units/initrd-cleanup.service.in

Only calls systemctl, so yes.

units/initrd-parse-etc.service.in

Only calls systemctl, so yes.

units/initrd-switch-root.service.in

Only calls systemctl, so yes.

units/initrd-udevadm-cleanup-db.service.in

Only calls udevadm, so yes.

units/kmod-static-nodes.service.in

kmod is not part of systemd, but I wouldn't expect it to use setuid
helpers, so yes?

units/ldconfig.service

ldconfig is not part of systemd, but I wouldn't expect it to use setuid
helpers, so yes?

units/quotaon.service.in

quotaon is not part of systemd, but I wouldn't expect it to use setuid
helpers, so yes?

units/rc-local.service.in

Local scripts could launch anything, so no.

units/rescue.service.in

sulogin launches root shell, so no.

units/serial-getty@.service.m4

getty launches user shell, so no.

units/systemd-ask-password-console.service.in

Only calls systemd-ask-password-agent, so yes.

units/systemd-ask-password-wall.service.in

Only calls systemd-ask-password-agent, so yes.

units/systemd-backli...@.service.in

Only calls systemd-backlight, so yes.

units/systemd-binfmt.service.in

Only calls systemd-binfmt, so yes.

units/systemd-bootchart.service.in

Only calls systemd-bootchart, so yes.

units/systemd-bus-proxyd.service.m4.in

Only calls systemd-bus-proxyd, so yes.

units/systemd-firstboot.service.in

Only calls systemd-firstboot. It doesn't seem to use pam_unix (which
could launch setuid helper unix_chkpwd) for checking passwords, but
internal implementation, so yes.

units/systemd-fsck-root.service.in

Only calls systemd-fsck, so yes.

units/systemd-f...@.service.in

Only calls systemd-fsck, so yes.

units/systemd-halt.service.in

Only calls systemctl, so yes.

units/systemd-hibernate-res...@.service.in

Only calls systemd-hibernate-resume, so yes.

units/systemd-hibernate.service.in

Only calls systemd-sleep, so yes.

units/systemd-hostnamed.service.in

Only calls systemd-hostnamed, so yes.

units/systemd-hwdb-update.service.in

Only calls systemd-hwdb, so yes.

units/systemd-hybrid-sleep.service.in

Only calls systemd-sleep, so yes.

units/systemd-importd.service.in

Calls systemd-importd, which in turn executes several helpers,
especially gpg which could be setuid to lock memory, so no.

units/systemd-initctl.service.in

Only calls systemd-initctl, so yes.

units/systemd-journal-catalog-update.service.in

Only calls systemd-journalctl, so yes.

units/systemd-journald.service.in

Only calls systemd-journald, so yes.

units/systemd-journal-flush.service.in

Only calls systemd-journalctl, so yes.

units/systemd-journal-gatewayd.service.in

Only calls systemd-journal-gatewayd, so yes.

units/systemd-journal-remote.service.in

Calls systemd-journal-remote, which in turn uses microhttpd that uses
several external libs, maybe no?

units/systemd-journal-upload.service.in

Only calls systemd-journal-upload (uses curl), so yes.

units/systemd-kexec.service.in

Only calls systemctl, so yes.

units/systemd-localed.service.in

Only calls systemd-localed, so yes.

units/systemd

Re: [systemd-devel] [PATCH] backlight: let udev properties override clamping

2015-01-31 Thread Topi Miettinen
On 01/29/15 00:09, Lennart Poettering wrote:
 On Wed, 28.01.15 23:51, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 diff --git a/src/backlight/backlight.c b/src/backlight/backlight.c
 index 1271a66..df53b75 100644
 --- a/src/backlight/backlight.c
 +++ b/src/backlight/backlight.c
 @@ -373,6 +373,7 @@ int main(int argc, char *argv[]) {
  
  if (streq(argv[1], load)) {
  _cleanup_free_ char *value = NULL;
 +const char *clamp;
  
  if (!shall_restore_state())
  return EXIT_SUCCESS;
 @@ -390,7 +391,9 @@ int main(int argc, char *argv[]) {
  return EXIT_FAILURE;
  }
  
 -clamp_brightness(device, value, max_brightness);
 +clamp = udev_device_get_property_value(device, 
 ID_BACKLIGHT_CLAMP);
 +if (clamp == NULL || streq(clamp, 1))
 
 Please use parse_boolean() for this.

OK.

 
 I think it would be better to name this ID_BACKLIGHT_CLAMP_MIN or so.

_MIN as in minimum? The brightness is also clamped to maximum brightness
advertised by the device.

-Topi

 
 Otherwise looks fine to me.
 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] backlight: let udev properties override clamping

2015-01-31 Thread Topi Miettinen
On my computer, the minimum brightness enforced by clamping in
backlight is too bright.

Let udev property ID_BACKLIGHT_CLAMP control whether the brightness
is clamped or not.
---
 man/systemd-backli...@.service.xml | 9 -
 src/backlight/backlight.c  | 5 -
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/man/systemd-backli...@.service.xml 
b/man/systemd-backli...@.service.xml
index 453afbf..21c6437 100644
--- a/man/systemd-backli...@.service.xml
+++ b/man/systemd-backli...@.service.xml
@@ -58,7 +58,14 @@
 is a service that restores the display backlight
 brightness at early boot and saves it at shutdown. On
 disk, the backlight brightness is stored in
-filename/var/lib/systemd/backlight//filename./para
+filename/var/lib/systemd/backlight//filename. During
+loading, if udev property
+optionID_BACKLIGHT_CLAMP/option is not set to
+false value, the brightness is clamped to a value of
+at least 1 or 5% of maximum brightness, whichever is
+greater. This restriction will be removed when the
+kernel allows user space to reliably set a brightness
+value which does not turn off the display./para
 /refsect1
 
 refsect1
diff --git a/src/backlight/backlight.c b/src/backlight/backlight.c
index 1271a66..2d357db 100644
--- a/src/backlight/backlight.c
+++ b/src/backlight/backlight.c
@@ -373,6 +373,7 @@ int main(int argc, char *argv[]) {
 
 if (streq(argv[1], load)) {
 _cleanup_free_ char *value = NULL;
+const char *clamp;
 
 if (!shall_restore_state())
 return EXIT_SUCCESS;
@@ -390,7 +391,9 @@ int main(int argc, char *argv[]) {
 return EXIT_FAILURE;
 }
 
-clamp_brightness(device, value, max_brightness);
+clamp = udev_device_get_property_value(device, 
ID_BACKLIGHT_CLAMP);
+if (clamp == NULL || parse_boolean(clamp))
+clamp_brightness(device, value, max_brightness);
 
 r = udev_device_set_sysattr_value(device, brightness, value);
 if (r  0) {
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] backlight: let udev properties override clamping

2015-01-28 Thread Topi Miettinen
On my computer, the minimum brightness enforced by clamping in
backlight is too bright.

Let udev property ID_BACKLIGHT_CLAMP control whether the brightness
is clamped or not.
---
 man/systemd-backli...@.service.xml | 11 ++-
 src/backlight/backlight.c  |  5 -
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/man/systemd-backli...@.service.xml 
b/man/systemd-backli...@.service.xml
index 453afbf..4105685 100644
--- a/man/systemd-backli...@.service.xml
+++ b/man/systemd-backli...@.service.xml
@@ -58,7 +58,16 @@
 is a service that restores the display backlight
 brightness at early boot and saves it at shutdown. On
 disk, the backlight brightness is stored in
-filename/var/lib/systemd/backlight//filename./para
+filename/var/lib/systemd/backlight//filename. During
+loading, if udev property ID_BACKLIGHT_CLAMP is not
+present or is set to literal1/literal, the
+brightness is clamped to at least value
+literal1/literal or 5% of maximum brightness. This
+is to avoid an unpowered display due to poor API
+design in kernel (unfixed as of 2015-01-28) that does
+not allow user space to know the difference between
+lowest brightness and powering off the
+backlight./para
 /refsect1
 
 refsect1
diff --git a/src/backlight/backlight.c b/src/backlight/backlight.c
index 1271a66..df53b75 100644
--- a/src/backlight/backlight.c
+++ b/src/backlight/backlight.c
@@ -373,6 +373,7 @@ int main(int argc, char *argv[]) {
 
 if (streq(argv[1], load)) {
 _cleanup_free_ char *value = NULL;
+const char *clamp;
 
 if (!shall_restore_state())
 return EXIT_SUCCESS;
@@ -390,7 +391,9 @@ int main(int argc, char *argv[]) {
 return EXIT_FAILURE;
 }
 
-clamp_brightness(device, value, max_brightness);
+clamp = udev_device_get_property_value(device, 
ID_BACKLIGHT_CLAMP);
+if (clamp == NULL || streq(clamp, 1))
+clamp_brightness(device, value, max_brightness);
 
 r = udev_device_set_sysattr_value(device, brightness, value);
 if (r  0) {
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] libudev-monitor: ensure proper string termination

2015-01-27 Thread Topi Miettinen
On 01/27/15 00:19, Lennart Poettering wrote:
 On Sun, 25.01.15 07:10, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 On 01/25/15 03:34, Zbigniew Jędrzejewski-Szmek wrote:
 On Sat, Jan 24, 2015 at 10:39:56AM +0200, Topi Miettinen wrote:
 Leave space for the terminating zero when reading and make sure
 that the last byte is zero. This also makes the check for long packets
 equivalent to code before 9c89c1ca: reject also packets with size 8192.
 ---
  src/libudev/libudev-monitor.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/src/libudev/libudev-monitor.c b/src/libudev/libudev-monitor.c
 index 4cfb2f6..b7fc031 100644
 --- a/src/libudev/libudev-monitor.c
 +++ b/src/libudev/libudev-monitor.c
 @@ -590,7 +590,7 @@ retry:
  if (udev_monitor == NULL)
  return NULL;
  iov.iov_base = buf;
 -iov.iov_len = sizeof(buf);
 +iov.iov_len = sizeof(buf) - 1; /* Leave space for terminating 
 zero */
  memzero(smsg, sizeof(struct msghdr));
  smsg.msg_iov = iov;
  smsg.msg_iovlen = 1;
 @@ -642,6 +642,8 @@ retry:
  if (udev_device == NULL)
  return NULL;
  
 +buf.raw[sizeof(buf.raw) - 1] = '\0';
 +
  if (memcmp(buf.raw, libudev, 8) == 0) {
  /* udev message needs proper version magic */
  if (buf.nlh.magic != htonl(UDEV_MONITOR_MAGIC)) {
 A buffer only needs to be terminated by a zero in certain cases: usually if 
 it
 is passed to a function which expectes that. iovecs can contain binary data,
 and have an explicit size field, so they do not need to be zero-terminated.
 Is there a reason why the buffer has to be zero-terminated in this case?

 String functions strcmp, strlen and strstr, used a few lines later,
 expect null byte terminated strings. Alternatively they could be changed
 to strncmp and friends where the scope can be limited to only the buffer.
 
 But the data comes from the kernel (and that's verified, securely),
 hence I am wondering what kind of vulnerability you have precisely in
 mind. If we don't trust the kernel, then we'll have quite a problem...

Right, I didn't look at the context. In that case, this would be just
defensive programming.

-Topi

 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] timesyncd: tighten unit file

2015-01-27 Thread Topi Miettinen
On 01/27/15 01:54, Lennart Poettering wrote:
 On Sun, 25.01.15 12:23, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.
 
 Hmm, that's not true, is it? load_clock_timestamp() is invoked before
 we drop privs in the daemon. And it certainly calls fchmod() and
 fchown(), so that it can later on still access the clock touch file.
 
 What am I missing?

It works for me because the file exists already with correct owner and
permissions. I'd put CAP_CHOWN and CAP_FOWNER back for now.

Maybe the clock file could reside in a separate subdirectory, then the
directory owner and permissions could be set up already at installation?

 
 No new privileges are needed, especially no setuid fixups are
 expected.
 
 Yeah, we should probably set this for most of our daemons, not just
 timesyncd. 
 
 I am not entirely sure how NoNewPrivileges= and no-setuid-fixup
 actually relate to each other. I.e. what does one do that the other
 doesn't? And if they do different things, isn't NoNewPrivileges= kinda
 a superset of no-setuid-fixup, and thus makes setting the latter
 redundant?

I think NoNewPrivileges (prctl(PR_SET_NO_NEW_PRIVS)) prevents any kind
of uid/gid changes, while no-setuid-fixup does not restrict uid change
but does not elevate the capability set.

 
 Reading through Documentation/prctl/no_new_privs.txt in the
 kernel sources I found this paragraph:
 
Be careful, though: LSMs might also not tighten constraints on
exec in no_new_privs mode.  (This means that setting up a
general-purpose service launcher to set no_new_privs before
execing daemons may interfere with LSM-based sandboxing.)
 
 This kinda suggests that SELinux and friends might not like
 NoNewPrivileges=, hence I am not entirely sure it is really the right
 option for us.
 
 Hence I am wondering if no-setuid-fixup (and -locked) might be the
 better option for us to enable everywhere?

I don't think timesyncd is executing any helper programs, especially
set-uid or with file system capabilities, so both should work fine.

 
 Device policy can be closed, timesyncd does not access any devices.
 
 Note that we set PrivateDevices=yes already, which implies
 DevicePolicy=closed.
 
 Timesyncd only needs write access to /var/lib/systemd/clock. There's no
 need to access /boot nor most API filesystems.
 
 Well, /boot is already covered by ProtectSystem=full actually (but
 this wasn't documented, admittedly. Updated the man page now in gate.)
 
 I am a bit concerned about listing by default all these directories in
 the unit file. I mean, the reason why ProtectSystem=, ProtectHome= and
 suchlike exists is precisely to break this down to a few booleans (or
 boolean-like enums), instead of listing all directories in all unit
 files all the time. I mean, it's great if people do this on their
 local systems, but to make this palatable generically upstream I
 created ProtectSystem= and ProtectHome= to hide the directory lists
 away.
 
 The problem with these lists is that we need to be really conservative
 with them, since many of the libs we use (including glibc) use
 different interface behind our back, and thus writing generally
 valid rules, that work on all archs, on all distros, on all
 glibc/libary versions is hard. This is what SELinux policy does, but I
 *really* didn't want to create another implementation of such a
 finegrained policy, if you follow what I mean.

I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
you be interested in AppArmor profile for timesyncd, I have one? Also,
if a distro uses weird SELinux policies or setuid helpers at every
possible opportunity, shouldn't they have some responsibility of fixing
their setup?

 
 Install a system call filter, generated with 'strace -c'.
 
 Hmm, I am a bit concerned about this part, I am not sure we can really
 do this upstream. Different glibc and other library versions we use
 invoke different syscalls. Moreover, different archs use different
 syscalls, too. This makes it really difficult putting together syscall
 filters upsteram that work generically, on all downstream versions.
 
 I think SystemCallFilter= is something that end-users can use on their
 systems, but I am not sure we can use this upstream. :-(

Right, this was generated on x86_64.

 
 Only one additional process is needed.
 
 Hmm, LimitNPROC=2 unfortunately doesn't do what we want it to do:
 since the daemon drops the to the systemd-timesync user on its own,
 PID 1 invokes it as root, not as the the systemd-timesync user, which
 makes the excercise pointless... We should probably handle this better
 though, and warn, instead of silently accepting it. Added that to the
 TODO list now.

This probably applies to other limits too?

 
 I have now changed the timesyncd code to do the right thing on its own
 after dropping privs. During runtime RLIMIT_NPROC=2 should now be set.
 
 Mounts should not propagate back, so set

Re: [systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-27 Thread Topi Miettinen
On 01/26/15 23:46, Lennart Poettering wrote:
 But independently of the PrivateDevices thing, would you think
 tmpfiles.d could be extended to be usable for unit specific cases
 instead of just one global setup? I think there could be more uses, for
 example, creating directories and links inside a unit's
 RuntimeDirectory.
 
 I am not sure how this could work and what kind of integration you
 precisely are looking for there?
 
 Note that tmpfiles exists mostly for two reasons: a) to deal with old
 software that wasn't capable of creating its own subdirs/stuff below
 its runtime directory; and b) to deal with software whose main program
 was running unpriviliged all the time (for example by using User=),
 and hence lacked the priviliges to set up its subdir in /run.

a) was exactly my case, auditd doesn't have a way to specify where to
put the pid file yet, so it ends up in /run/auditd.pid.

 
 Now, to deal with case b) we nowadays have RuntimeDirectory=. And for
 a) I think the long story must be to make it set up its own stuff in
 /run, and I don#t see how tmpfiles could break any benefit there...
 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-27 Thread Topi Miettinen
On 01/26/15 21:04, Lennart Poettering wrote:
 On Mon, 26.01.15 17:07, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 On 01/26/15 12:41, Simon McVittie wrote:
 On 24/01/15 10:09, Topi Miettinen wrote:
 For example, smartd only needs access to /dev/sd*.

 Let me spell that differently: smartd only needs the ability to make
 arbitrary filesystem changes, defeating any possible configurable
 security mechanism.

 Not exactly: it only needs read access. Depending on the system, that
 could be very different from being able to make arbitrary filesystem
 changes.
 
 Sending SMART requests requires the same priviliges as issue direct
 low-level write requests to my knowledge, hence I'd say simon is right.

CAP_SYS_RAWIO, yes. Only read access is needed otherwise:
DevicePolicy=closed
DeviceAllow=block-sd r
DeviceAllow=/dev/sda r
DeviceAllow=/dev/sdb r
works fine here.

Probably CAP_SYS_RAWIO can be used to circumvent the lack of write
access, though.

-Topi

 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-27 Thread Topi Miettinen
On 01/27/15 20:52, Lennart Poettering wrote:
 On Tue, 27.01.15 18:51, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 On 01/26/15 21:04, Lennart Poettering wrote:
 On Mon, 26.01.15 17:07, Topi Miettinen (toiwo...@gmail.com) wrote:

 On 01/26/15 12:41, Simon McVittie wrote:
 On 24/01/15 10:09, Topi Miettinen wrote:
 For example, smartd only needs access to /dev/sd*.

 Let me spell that differently: smartd only needs the ability to make
 arbitrary filesystem changes, defeating any possible configurable
 security mechanism.

 Not exactly: it only needs read access. Depending on the system, that
 could be very different from being able to make arbitrary filesystem
 changes.

 Sending SMART requests requires the same priviliges as issue direct
 low-level write requests to my knowledge, hence I'd say simon is right.

 CAP_SYS_RAWIO, yes. Only read access is needed otherwise:
 DevicePolicy=closed
 DeviceAllow=block-sd r
 DeviceAllow=/dev/sda r
 DeviceAllow=/dev/sdb r
 works fine here.
 
 You should be able to reduce this to simply:
 
 DeviceAllow=block-sd r
 
 This should suffic since DevicePolicy=closed is implied if there's at
 least one DeviceAllow= specified. And DeviceAllow=block-sd r enables
 access to all /dev/sd* access, which includes /dev/sda and /dev/sdb,
 of course.

In general yes, but I didn't want to allow SMART requests to /dev/sdc,
it's a DVD-ROM drive and there are useless errors if accessed with SMART.

 
 Probably CAP_SYS_RAWIO can be used to circumvent the lack of write
 access, though.
 
 Yes, it can.
 
 I think in general though that sandboxing should be done even if there
 are ways to circumvent it. It might still protects from accidental
 mishaps even if it doesn't prevent attackers to exploit things.
 
 That said, as mentioned in my earlier mail smartd will probably not be
 happy with accessing only /dev/sd*, it wants access to all kinds of
 other devices. For example it has special support for certain SCSI
 RAID drivers and such, which are not accessed via /dev/sd*...
 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-27 Thread Topi Miettinen
On 01/27/15 21:40, Lennart Poettering wrote:
 On Tue, 27.01.15 21:38, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 CAP_SYS_RAWIO, yes. Only read access is needed otherwise:
 DevicePolicy=closed
 DeviceAllow=block-sd r
 DeviceAllow=/dev/sda r
 DeviceAllow=/dev/sdb r
 works fine here.

 You should be able to reduce this to simply:

 DeviceAllow=block-sd r

 This should suffic since DevicePolicy=closed is implied if there's at
 least one DeviceAllow= specified. And DeviceAllow=block-sd r enables
 access to all /dev/sd* access, which includes /dev/sda and /dev/sdb,
 of course.

 In general yes, but I didn't want to allow SMART requests to /dev/sdc,
 it's a DVD-ROM drive and there are useless errors if accessed with
 SMART.
 
 Well, don't you just get a different error then?

Embarrassingly it looks like I actually fixed the error by editing
smartd.conf and not with the unit file...

-Topi

 
 That said, if this is really what you want, then you should really
 remove the DeviceAllow=block-sd r line, since that opens up access
 to /dev/sdc, too...
 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-27 Thread Topi Miettinen
On 01/27/15 20:48, Lennart Poettering wrote:
 On Tue, 27.01.15 19:04, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 On 01/26/15 23:46, Lennart Poettering wrote:
 But independently of the PrivateDevices thing, would you think
 tmpfiles.d could be extended to be usable for unit specific cases
 instead of just one global setup? I think there could be more uses, for
 example, creating directories and links inside a unit's
 RuntimeDirectory.

 I am not sure how this could work and what kind of integration you
 precisely are looking for there?

 Note that tmpfiles exists mostly for two reasons: a) to deal with old
 software that wasn't capable of creating its own subdirs/stuff below
 its runtime directory; and b) to deal with software whose main program
 was running unpriviliged all the time (for example by using User=),
 and hence lacked the priviliges to set up its subdir in /run.

 a) was exactly my case, auditd doesn't have a way to specify where to
 put the pid file yet, so it ends up in /run/auditd.pid.
 
 Hmm, but that's fine, no? What would you put in tmpfiles for auditd?

I'd want it to put the pid file somewhere else, like RuntimeDirectory.

L/run/auditd.pid ----   /run/auditd/auditd.pid

This is probably a bad example as pid files could be deleted by the
daemon at exit.

 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] timesyncd: tighten unit file

2015-01-27 Thread Topi Miettinen
On 01/27/15 21:16, Lennart Poettering wrote:
 On Tue, 27.01.15 19:47, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 On 01/27/15 01:54, Lennart Poettering wrote:
 On Sun, 25.01.15 12:23, Topi Miettinen (toiwo...@gmail.com) wrote:

 There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.

 Hmm, that's not true, is it? load_clock_timestamp() is invoked before
 we drop privs in the daemon. And it certainly calls fchmod() and
 fchown(), so that it can later on still access the clock touch file.

 What am I missing?

 It works for me because the file exists already with correct owner and
 permissions. I'd put CAP_CHOWN and CAP_FOWNER back for now. 
 
 Well, but this also means CAP_DAC_OVERRIDE is required, as otherwise
 we won't be able to access the file while we are still root, and have
 not dropped privileges anymore.
 
 Note that timesyncd actually drops all remaining caps when changng
 to the systemd-timesync user, with the exception of
 CAP_SYS_TIME. This means that the caps configured in the unit file
 don't matter too much as they only are in effect for the short time
 when timesyncd is initializing and hasn't dropped all the remaining
 caps except for CAP_SYS_TIME yet.
 
 Maybe the clock file could reside in a separate subdirectory, then the
 directory owner and permissions could be set up already at
 installation?
 
 Well, to enable stateless systems I think it is a good idea to write
 services in a way that they can rebuild what they need in /var on
 their own, should it be missing, so that /var can be flushed out and
 things will just work when the machine is then booted again.
 
 All our daemons are written in a style where /var is reconstructed
 implicitly if it is missing, and timesyncd really should work the same
 way.

Nice, but the directories could be created with tmpfiles.d then?

 
 I am not entirely sure how NoNewPrivileges= and no-setuid-fixup
 actually relate to each other. I.e. what does one do that the other
 doesn't? And if they do different things, isn't NoNewPrivileges= kinda
 a superset of no-setuid-fixup, and thus makes setting the latter
 redundant?

 I think NoNewPrivileges (prctl(PR_SET_NO_NEW_PRIVS)) prevents any kind
 of uid/gid changes, while no-setuid-fixup does not restrict uid change
 but does not elevate the capability set.
 
 The PR_SET_NO_NEW_PRIVS docs say explicitly it only applies to
 execve(). 
 
 To be honest, I am still very puzzled about this. The docs are awful
 about this...
 
 Reading through Documentation/prctl/no_new_privs.txt in the
 kernel sources I found this paragraph:

Be careful, though: LSMs might also not tighten constraints on
exec in no_new_privs mode.  (This means that setting up a
general-purpose service launcher to set no_new_privs before
execing daemons may interfere with LSM-based sandboxing.)

 This kinda suggests that SELinux and friends might not like
 NoNewPrivileges=, hence I am not entirely sure it is really the right
 option for us.

 Hence I am wondering if no-setuid-fixup (and -locked) might be the
 better option for us to enable everywhere?

 I don't think timesyncd is executing any helper programs, especially
 set-uid or with file system capabilities, so both should work fine.
 
 Well, the way I read the paragraph above if we set PR_SET_NO_NEW_PRIVS
 after forking, but before execing systemd-timesyncd, and that binary
 has an selinux label on it, that the selinux label would be ignore,
 and things would continue to run with the same label as we forked
 off. This pretty much renders SELinux usesless, since all services
 where we set the bit for would then run as init_t... and that's
 something we really shouldn't do.

seccomp_filter.txt on the other hand says that
Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
run with CAP_SYS_ADMIN privileges in its namespace.  If these are not
true, -EACCES will be returned.  This requirement ensures that filter
programs cannot be applied to child processes with greater privileges
than the task that installed them.

Does this mean that SELinux and seccomp are mutually exclusive? Awful
design if so.

 
 The problem with these lists is that we need to be really conservative
 with them, since many of the libs we use (including glibc) use
 different interface behind our back, and thus writing generally
 valid rules, that work on all archs, on all distros, on all
 glibc/libary versions is hard. This is what SELinux policy does, but I
 *really* didn't want to create another implementation of such a
 finegrained policy, if you follow what I mean.

 I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
 you be interested in AppArmor profile for timesyncd, I have one? Also,
 if a distro uses weird SELinux policies or setuid helpers at every
 possible opportunity, shouldn't they have some responsibility of fixing
 their setup?
 
 Well, SELinux policy is managed in a central selinux policy database
 that is shipped in one big RPM. My

Re: [systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-27 Thread Topi Miettinen
On 01/27/15 21:35, Lennart Poettering wrote:
 On Tue, 27.01.15 21:32, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 On 01/27/15 20:48, Lennart Poettering wrote:
 On Tue, 27.01.15 19:04, Topi Miettinen (toiwo...@gmail.com) wrote:

 On 01/26/15 23:46, Lennart Poettering wrote:
 But independently of the PrivateDevices thing, would you think
 tmpfiles.d could be extended to be usable for unit specific cases
 instead of just one global setup? I think there could be more uses, for
 example, creating directories and links inside a unit's
 RuntimeDirectory.

 I am not sure how this could work and what kind of integration you
 precisely are looking for there?

 Note that tmpfiles exists mostly for two reasons: a) to deal with old
 software that wasn't capable of creating its own subdirs/stuff below
 its runtime directory; and b) to deal with software whose main program
 was running unpriviliged all the time (for example by using User=),
 and hence lacked the priviliges to set up its subdir in /run.

 a) was exactly my case, auditd doesn't have a way to specify where to
 put the pid file yet, so it ends up in /run/auditd.pid.

 Hmm, but that's fine, no? What would you put in tmpfiles for auditd?

 I'd want it to put the pid file somewhere else, like RuntimeDirectory.

 L/run/auditd.pid ----   /run/auditd/auditd.pid

 This is probably a bad example as pid files could be deleted by the
 daemon at exit.
 
 I think it would be better to fix this in auditd itself and make it
 use a proper dir below /run, rather than store its stuff in /run
 itself...

Fully agree.

-Topi

 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-26 Thread Topi Miettinen
On 01/26/15 12:41, Simon McVittie wrote:
 On 24/01/15 10:09, Topi Miettinen wrote:
 For example, smartd only needs access to /dev/sd*.
 
 Let me spell that differently: smartd only needs the ability to make
 arbitrary filesystem changes, defeating any possible configurable
 security mechanism.

Not exactly: it only needs read access. Depending on the system, that
could be very different from being able to make arbitrary filesystem
changes.

 
 If you give it access to /dev/sd* but not to other devices, what
 security or safety have you actually gained, compared with giving it all
 of /dev?

Maybe nothing. But why should smartd be able access any other devices?

 
 Admittedly, there are better examples, like saned only needing access to
 USB scanners (plus SCSI scanners, serial ports and parallel ports if you
 care about older hardware). I suspect device permissions are a rather
 better answer for finer-grained access control than all or nothing,
 though.

If a device does not exist at all, it's harder to access it than if only
device permissions and/or SELinux protect it. Not impossible, but harder.

-Topi

 
 S
 
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-26 Thread Topi Miettinen
On 01/26/15 16:13, Lennart Poettering wrote:
 On Sat, 24.01.15 10:09, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 Hello,

 It would be useful to be able to use PrivateDevices with additional
 devices to the basic set (null, zero, urandom etc). For example, smartd
 only needs access to /dev/sd*. It would be a bit complex to do this
 without help of systemd, you would have to set up the private /dev
 filesystem by hand before starting the daemon.
 
 First of all, smartd usually only acesses /dev/sd*, but it actually
 has drivers that use quite a few other device nodes too (for example
 to support that weird SCSI SMART stuff). Thus, limiting access to only
 /dev/sd* might work in many specific cases, but certainly not in the
 general case.
 
 However, the bigger problem is that setting /dev up like that would
 not be a one-time thing. As new devices appear or old devices
 disappear the device nodes in the service-specific /dev would have to
 be created/updated/removed. And that's substantial complexity.
 
 PrivateDevices= is supposed to be a one-stop solution for services
 that that need zero device access, and I am not convinced we should
 turn it into more than that. It's supposed to be simple, easy to
 understand. Right now, I can tell people easily: hey, if your service
 never needs access to physical devices, turn on PrivateDevices=!, and
 it is really that easy, there's nothing more to know. However, if we
 turn this into something more than that, then the whole thing becomes
 a ton more complex, not only in code, but also in explaining it to
 people.

I think the code would not get much complex. The device list would be
gathered and passed to mount_dev() in namespace.c. And documentation can
be easily expanded.

 
 Also, the level of security that PrivateDevices= provides is not
 substantially more than configuring the devices cgroup
 controller. The latter only controls access, the former also hides the
 device nodes, but that's about it. However, the latter you configure
 much more flexibily, and you can do one thing specifically: you can
 actually configure access to device nodes of whole classes of
 things. For example DeviceAllow=block-sd can enforce access
 limitation on what you are asking for. And it doesn't need any
 complexity to handle when devices come and go, because the rule is
 independent of actual device nodes in the file system.

Good point.

 
 How about this: When PrivateDevices is enabled (perhaps with a new
 extended mode like PrivateDevices=Auto?), any DeviceAllow directives
 would automatically append the device in question to the list of devices
 to be copied to the private /dev. The list of devices could be stated
 with a new directive instead (CopyDevices=/dev/sda /dev/sdb).

 Or perhaps tmpfiles.d should be extended instead, that would allow more
 actions than just device setup? For example, unit files could point to a
 tmpfiles.d directory or file that will be processed inside the unit
 container before the unit is executed?
 
 Both of these proposals cannot deal with devices coming and going, and
 that's kind of a major deal breaker, since we try not to wait for
 devices during boot-up more than necessary, and hence not even for
 always plugged in devices this could ever work without races...

Maybe udev should be able to handle several /dev directories in the
system, each with a different configuration...

But independently of the PrivateDevices thing, would you think
tmpfiles.d could be extended to be usable for unit specific cases
instead of just one global setup? I think there could be more uses, for
example, creating directories and links inside a unit's RuntimeDirectory.

 
 Sorry,
 
 Lennart
 

Thanks for the long explanations.

-Topi

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] timesyncd: tighten unit file

2015-01-25 Thread Topi Miettinen
There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.

No new privileges are needed, especially no setuid fixups are expected.

Device policy can be closed, timesyncd does not access any devices.

Timesyncd only needs write access to /var/lib/systemd/clock. There's no
need to access /boot nor most API filesystems.

Install a system call filter, generated with 'strace -c'.

Only one additional process is needed.

Mounts should not propagate back, so set the MountFlags to slave
(actually default since we use e.g. PrivateTmp).
---
 units/systemd-timesyncd.service.in | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/units/systemd-timesyncd.service.in 
b/units/systemd-timesyncd.service.in
index 39edafc..ef09f05 100644
--- a/units/systemd-timesyncd.service.in
+++ b/units/systemd-timesyncd.service.in
@@ -22,12 +22,21 @@ Type=notify
 Restart=always
 RestartSec=0
 ExecStart=@rootlibexecdir@/systemd-timesyncd
-CapabilityBoundingSet=CAP_SYS_TIME CAP_SETUID CAP_SETGID CAP_SETPCAP CAP_CHOWN 
CAP_DAC_OVERRIDE CAP_FOWNER
+CapabilityBoundingSet=CAP_SYS_TIME CAP_SETUID CAP_SETGID CAP_SETPCAP
+NoNewPrivileges=true
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
 PrivateTmp=yes
 PrivateDevices=yes
+DevicePolicy=closed
 ProtectSystem=full
 ProtectHome=yes
+InaccessibleDirectories=/dev/pts /dev/shm /dev/mqueue /dev/hugepages /boot /sys
+ReadOnlyDirectories=/
+ReadWriteDirectories=/var/lib/systemd/clock
 WatchdogSec=1min
+SystemCallFilter=recvfrom clock_gettime prctl read open close stat fstat poll 
lseek mmap mprotect munmap brk rt_sigaction rt_sigprocmask ioctl access madvise 
socket connect sendto sendmsg recvmsg bind getsockname socketpair setsockopt 
getsockopt clone fcntl umask getrlimit setgroups setresuid setresgid capget 
capset arch_prctl gettid futex set_tid_address epoll_wait epoll_ctl 
inotify_add_watch set_robust_list utimensat timerfd_create timerfd_settime 
signalfd4 epoll_create1 inotify_init1 clock_adjtime sendmmsg
+LimitNPROC=2
+MountFlags=slave
 
 [Install]
 WantedBy=sysinit.target
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] libudev-monitor: ensure proper string termination

2015-01-24 Thread Topi Miettinen
On 01/25/15 03:34, Zbigniew Jędrzejewski-Szmek wrote:
 On Sat, Jan 24, 2015 at 10:39:56AM +0200, Topi Miettinen wrote:
 Leave space for the terminating zero when reading and make sure
 that the last byte is zero. This also makes the check for long packets
 equivalent to code before 9c89c1ca: reject also packets with size 8192.
 ---
  src/libudev/libudev-monitor.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/src/libudev/libudev-monitor.c b/src/libudev/libudev-monitor.c
 index 4cfb2f6..b7fc031 100644
 --- a/src/libudev/libudev-monitor.c
 +++ b/src/libudev/libudev-monitor.c
 @@ -590,7 +590,7 @@ retry:
  if (udev_monitor == NULL)
  return NULL;
  iov.iov_base = buf;
 -iov.iov_len = sizeof(buf);
 +iov.iov_len = sizeof(buf) - 1; /* Leave space for terminating zero 
 */
  memzero(smsg, sizeof(struct msghdr));
  smsg.msg_iov = iov;
  smsg.msg_iovlen = 1;
 @@ -642,6 +642,8 @@ retry:
  if (udev_device == NULL)
  return NULL;
  
 +buf.raw[sizeof(buf.raw) - 1] = '\0';
 +
  if (memcmp(buf.raw, libudev, 8) == 0) {
  /* udev message needs proper version magic */
  if (buf.nlh.magic != htonl(UDEV_MONITOR_MAGIC)) {
 A buffer only needs to be terminated by a zero in certain cases: usually if it
 is passed to a function which expectes that. iovecs can contain binary data,
 and have an explicit size field, so they do not need to be zero-terminated.
 Is there a reason why the buffer has to be zero-terminated in this case?

String functions strcmp, strlen and strstr, used a few lines later,
expect null byte terminated strings. Alternatively they could be changed
to strncmp and friends where the scope can be limited to only the buffer.

-Topi

 
 Zbyszek
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] PrivateDevices with more than basic set of devices?

2015-01-24 Thread Topi Miettinen
Hello,

It would be useful to be able to use PrivateDevices with additional
devices to the basic set (null, zero, urandom etc). For example, smartd
only needs access to /dev/sd*. It would be a bit complex to do this
without help of systemd, you would have to set up the private /dev
filesystem by hand before starting the daemon.

How about this: When PrivateDevices is enabled (perhaps with a new
extended mode like PrivateDevices=Auto?), any DeviceAllow directives
would automatically append the device in question to the list of devices
to be copied to the private /dev. The list of devices could be stated
with a new directive instead (CopyDevices=/dev/sda /dev/sdb).

Or perhaps tmpfiles.d should be extended instead, that would allow more
actions than just device setup? For example, unit files could point to a
tmpfiles.d directory or file that will be processed inside the unit
container before the unit is executed?

-Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] libudev-monitor: ensure proper string termination

2015-01-24 Thread Topi Miettinen
Leave space for the terminating zero when reading and make sure
that the last byte is zero. This also makes the check for long packets
equivalent to code before 9c89c1ca: reject also packets with size 8192.
---
 src/libudev/libudev-monitor.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/libudev/libudev-monitor.c b/src/libudev/libudev-monitor.c
index 4cfb2f6..b7fc031 100644
--- a/src/libudev/libudev-monitor.c
+++ b/src/libudev/libudev-monitor.c
@@ -590,7 +590,7 @@ retry:
 if (udev_monitor == NULL)
 return NULL;
 iov.iov_base = buf;
-iov.iov_len = sizeof(buf);
+iov.iov_len = sizeof(buf) - 1; /* Leave space for terminating zero */
 memzero(smsg, sizeof(struct msghdr));
 smsg.msg_iov = iov;
 smsg.msg_iovlen = 1;
@@ -642,6 +642,8 @@ retry:
 if (udev_device == NULL)
 return NULL;
 
+buf.raw[sizeof(buf.raw) - 1] = '\0';
+
 if (memcmp(buf.raw, libudev, 8) == 0) {
 /* udev message needs proper version magic */
 if (buf.nlh.magic != htonl(UDEV_MONITOR_MAGIC)) {
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] libudev: fix check for too long packet

2015-01-23 Thread Topi Miettinen
On 01/23/15 17:43, Lennart Poettering wrote:
 On Fri, 23.01.15 17:29, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 On 01/23/15 03:06, Lennart Poettering wrote:
 On Sun, 18.01.15 23:57, Topi Miettinen (toiwo...@gmail.com) wrote:

 Don't use recvmsg(2) return value to check for too long packets
 (it doesn't work) but MSG_TRUNC flag.

 Why precisely doesn't this work? I mean, it will consider messages
 that are exactly as large as the buffer as too long, but otherwise the
 old check should be fine, no?

 It doesn't work because the return value of recvmsg() never exceeds the
 buffer size, so too large packets are never detected.
 
 But the test was =, not . So the old code *did* recognize all
 too large packets, though it would already do so one byte earlier than
 your new check...

True. What should be considered too large, a full buffer (which might
not contain a trailing zero, so the strcmp later could fall off of the
buffer...), or buffer size - 1 (the last byte is not explicitly set to
zero, so badness could happen anyway)?

-Topi

 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] libudev: fix check for too long packet

2015-01-23 Thread Topi Miettinen
On 01/23/15 03:06, Lennart Poettering wrote:
 On Sun, 18.01.15 23:57, Topi Miettinen (toiwo...@gmail.com) wrote:
 
 Don't use recvmsg(2) return value to check for too long packets
 (it doesn't work) but MSG_TRUNC flag.
 
 Why precisely doesn't this work? I mean, it will consider messages
 that are exactly as large as the buffer as too long, but otherwise the
 old check should be fine, no?

It doesn't work because the return value of recvmsg() never exceeds the
buffer size, so too large packets are never detected.

-Topi

 
 (The new check is much nicer though, admittedly)
 
 ---
  src/libudev/libudev-monitor.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/libudev/libudev-monitor.c b/src/libudev/libudev-monitor.c
 index 484fefe..d8e551b 100644
 --- a/src/libudev/libudev-monitor.c
 +++ b/src/libudev/libudev-monitor.c
 @@ -609,7 +609,7 @@ retry:
  return NULL;
  }
  
 -if (buflen  32 || (size_t)buflen = sizeof(buf)) {
 +if (buflen  32 || smsg.msg_flags  MSG_TRUNC) {
  log_debug(invalid message length);
  return NULL;
  }
 -- 
 2.1.4

 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 
 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH v2] backlight: let the administrator override clamping

2015-01-18 Thread Topi Miettinen
On my computer, the minimum brightness enforced by clamping in
backlight is too bright.

Add a new option to override clamping in unit file. While at it, describe
the clamping in documentation.
---
 man/systemd-backli...@.service.xml | 27 +--
 src/backlight/backlight.c  | 11 ++-
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/man/systemd-backli...@.service.xml 
b/man/systemd-backli...@.service.xml
index 453afbf..0d30957 100644
--- a/man/systemd-backli...@.service.xml
+++ b/man/systemd-backli...@.service.xml
@@ -48,7 +48,17 @@
 
 refsynopsisdiv
 parafilenamesystemd-backlight@.service/filename/para
-
parafilename/usr/lib/systemd/systemd-backlight/filename/para
+cmdsynopsis
+commandsystemd-backlight/command
+arg choice=plainsave/arg
+arg choice=optreplaceablePATH/replaceable/arg
+/cmdsynopsis
+cmdsynopsis
+commandsystemd-backlight/command
+arg choice=plainload/arg
+arg choice=optreplaceablePATH/replaceable/arg
+arg 
choice=optreplaceable--allow-any-brightness/replaceable/arg
+/cmdsynopsis
 /refsynopsisdiv
 
 refsect1
@@ -58,7 +68,20 @@
 is a service that restores the display backlight
 brightness at early boot and saves it at shutdown. On
 disk, the backlight brightness is stored in
-filename/var/lib/systemd/backlight//filename./para
+filename/var/lib/systemd/backlight//filename. During
+loading, unless option
+replaceable--allow-any-brightness/replaceable is
+specified, the brightness is clamped to at least value
+literal1/literal or 5% of maximum brightness. This
+is to avoid an unpowered display due to poor API
+design in kernel (unfixed as of 2015-01-18) that does
+not allow user space to know the difference between
+lowest brightness and powering off the
+backlight./para
+
+parareplaceablePATH/replaceable identifies the
+display brightness control device. It is resolved by
+udev./para
 /refsect1
 
 refsect1
diff --git a/src/backlight/backlight.c b/src/backlight/backlight.c
index 1271a66..96a25d8 100644
--- a/src/backlight/backlight.c
+++ b/src/backlight/backlight.c
@@ -255,7 +255,7 @@ static void clamp_brightness(struct udev_device *device, 
char **value, unsigned
 return;
 }
 
-log_info(Saved brightness %s %s to %s., old_value,
+log_info(Saved brightness %s %s to %s (use 
--allow-any-brightness to override)., old_value,
  new_brightness  brightness ?
  too low; increasing : too high; decreasing,
  *value);
@@ -272,8 +272,8 @@ int main(int argc, char *argv[]) {
 unsigned max_brightness;
 int r;
 
-if (argc != 3) {
-log_error(This program requires two arguments.);
+if (argc  3 || argc  4) {
+log_error(This program requires two or three arguments.);
 return EXIT_FAILURE;
 }
 
@@ -389,8 +389,9 @@ int main(int argc, char *argv[]) {
 log_error_errno(r, Failed to read %s: %m, saved);
 return EXIT_FAILURE;
 }
-
-clamp_brightness(device, value, max_brightness);
+/* Don't clamp brightness if asked */
+if (!(argc == 4  streq(argv[3], --allow-any-brightness)))
+clamp_brightness(device, value, max_brightness);
 
 r = udev_device_set_sysattr_value(device, brightness, value);
 if (r  0) {
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] libudev: fix check for too long packet

2015-01-18 Thread Topi Miettinen
Don't use recvmsg(2) return value to check for too long packets
(it doesn't work) but MSG_TRUNC flag.
---
 src/libudev/libudev-monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/libudev/libudev-monitor.c b/src/libudev/libudev-monitor.c
index 484fefe..d8e551b 100644
--- a/src/libudev/libudev-monitor.c
+++ b/src/libudev/libudev-monitor.c
@@ -609,7 +609,7 @@ retry:
 return NULL;
 }
 
-if (buflen  32 || (size_t)buflen = sizeof(buf)) {
+if (buflen  32 || smsg.msg_flags  MSG_TRUNC) {
 log_debug(invalid message length);
 return NULL;
 }
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Suspicious assertions in resolved

2015-01-18 Thread Topi Miettinen
On 01/18/15 20:45, David Herrmann wrote:
 Hi
 
 On Sun, Jan 18, 2015 at 8:12 PM, Topi Miettinen toiwo...@gmail.com wrote:
 Hello,

 I think resolved_manager.c function manager_recv() has an assertion that
 could be triggerable by the server sending an oversized packet:

 assert(!(mh.msg_flags  MSG_TRUNC));

 The other assertions look suspicious too but I don't know if they can
 really be triggered by the other side.
 
 We use FIONREAD to read the size of the next pending datagram.
 Therefore, MSG_TRUNC cannot be set. Similarly, we provide suitable
 control-data space so MSG_CTRUNC cannot be set, either.

OK. What about the assertions later, is it possible to receive a reply
via IPv6 for IPv4 request or the other way around?

 
 Thanks
 David
 
 I'd propose something like this:

 diff --git a/src/resolve/resolved-manager.c b/src/resolve/resolved-manager.c
 index 0594479..b1defa3 100644
 --- a/src/resolve/resolved-manager.c
 +++ b/src/resolve/resolved-manager.c
 @@ -894,7 +894,8 @@ int manager_recv(Manager *m, int fd, DnsProtocol
 protocol, DnsPacket **ret) {
  return -EIO;

  assert(!(mh.msg_flags  MSG_CTRUNC));
 -assert(!(mh.msg_flags  MSG_TRUNC));
 +if (mh.msg_flags  MSG_TRUNC)
 +return -EIO;

  p-size = (size_t) l;


 -Topi
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Suspicious assertions in resolved

2015-01-18 Thread Topi Miettinen
On 01/18/15 21:23, David Herrmann wrote:
 Hi
 
 On Sun, Jan 18, 2015 at 10:15 PM, Topi Miettinen toiwo...@gmail.com wrote:
 On 01/18/15 20:45, David Herrmann wrote:
 Hi

 On Sun, Jan 18, 2015 at 8:12 PM, Topi Miettinen toiwo...@gmail.com wrote:
 Hello,

 I think resolved_manager.c function manager_recv() has an assertion that
 could be triggerable by the server sending an oversized packet:

 assert(!(mh.msg_flags  MSG_TRUNC));

 The other assertions look suspicious too but I don't know if they can
 really be triggered by the other side.

 We use FIONREAD to read the size of the next pending datagram.
 Therefore, MSG_TRUNC cannot be set. Similarly, we provide suitable
 control-data space so MSG_CTRUNC cannot be set, either.

 OK. What about the assertions later, is it possible to receive a reply
 via IPv6 for IPv4 request or the other way around?
 
 Those asserts verify that the CMSG socket-type is the same as the
 PACKET/PEER socket-type. I don't see how this looks at the
 request-type? Those assert()s look good to me.

OK, sorry for the noise.

-Topi

 Thanks
 David
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] backlight: let the administrator override clamping

2015-01-17 Thread Topi Miettinen
On 01/17/15 11:38, David Herrmann wrote:
 Hi
 
 On Sat, Jan 17, 2015 at 12:28 PM, Topi Miettinen toiwo...@gmail.com wrote:
 On my computer, the minimum brightness enforced by clamping in
 backlight is too bright.
 
 How can 5% of the backlight be too bright? Can you give some
 information on your backlight device? (type, max_brightness,
 actual_brightness and so on).

Well, my eyes start to hurt with level 1, 0 is OK. Max_brightness is 9,
actual_brightness always matches brightness. This is cheap old Acer
Aspire 8530 laptop with Mobility Radeon HD 3200, I don't know beyond
that what handles backlight.

 
 Btw., we use gnu-getopt_long() style arguments with two dashes. And we
 try to avoid negations in option names.

Ok. How about --disable-clamping or --allow-low-brightness, or would you
have better ideas?

 
 Thanks
 David
 
 Add a new option to override clamping in unit file. While at it, describe
 the clamping in documentation.
 ---
  man/systemd-backli...@.service.xml | 23 +--
  src/backlight/backlight.c  | 11 ++-
  2 files changed, 27 insertions(+), 7 deletions(-)

 diff --git a/man/systemd-backli...@.service.xml 
 b/man/systemd-backli...@.service.xml
 index 453afbf..26a2437 100644
 --- a/man/systemd-backli...@.service.xml
 +++ b/man/systemd-backli...@.service.xml
 @@ -48,7 +48,17 @@

  refsynopsisdiv
  parafilenamesystemd-backlight@.service/filename/para
 -
 parafilename/usr/lib/systemd/systemd-backlight/filename/para
 +cmdsynopsis
 +commandsystemd-backlight/command
 +arg choice=plainsave/arg
 +arg 
 choice=optreplaceablePATH/replaceable/arg
 +/cmdsynopsis
 +cmdsynopsis
 +commandsystemd-backlight/command
 +arg choice=plainload/arg
 +arg 
 choice=optreplaceablePATH/replaceable/arg
 +arg 
 choice=optreplaceable-no-clamp/replaceable/arg
 +/cmdsynopsis
  /refsynopsisdiv

  refsect1
 @@ -58,7 +68,16 @@
  is a service that restores the display backlight
  brightness at early boot and saves it at shutdown. On
  disk, the backlight brightness is stored in
 -filename/var/lib/systemd/backlight//filename./para
 +filename/var/lib/systemd/backlight//filename. During
 +loading, unless option
 +replaceable-no-clamp/replaceable is specified, the
 +brightness is clamped to at least value
 +literal1/literal or 5% of maximum
 +brightness./para
 +
 +parareplaceablePATH/replaceable identifies the
 +display brightness control device. It is resolved by
 +udev./para
  /refsect1

  refsect1
 diff --git a/src/backlight/backlight.c b/src/backlight/backlight.c
 index 1271a66..a3f71e9 100644
 --- a/src/backlight/backlight.c
 +++ b/src/backlight/backlight.c
 @@ -255,7 +255,7 @@ static void clamp_brightness(struct udev_device *device, 
 char **value, unsigned
  return;
  }

 -log_info(Saved brightness %s %s to %s., old_value,
 +log_info(Saved brightness %s %s to %s (use -no-clamp to 
 override)., old_value,
   new_brightness  brightness ?
   too low; increasing : too high; decreasing,
   *value);
 @@ -272,8 +272,8 @@ int main(int argc, char *argv[]) {
  unsigned max_brightness;
  int r;

 -if (argc != 3) {
 -log_error(This program requires two arguments.);
 +if (argc  3 || argc  4) {
 +log_error(This program requires two or three arguments.);
  return EXIT_FAILURE;
  }

 @@ -389,8 +389,9 @@ int main(int argc, char *argv[]) {
  log_error_errno(r, Failed to read %s: %m, saved);
  return EXIT_FAILURE;
  }
 -
 -clamp_brightness(device, value, max_brightness);
 +/* Don't clamp brightness if asked */
 +if (!(argc == 4  streq(argv[3], -no-clamp)))
 +clamp_brightness(device, value, max_brightness);

  r = udev_device_set_sysattr_value(device, brightness, 
 value);
  if (r  0) {
 --
 2.1.4

 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] backlight: let the administrator override clamping

2015-01-17 Thread Topi Miettinen
On 01/17/15 12:34, David Herrmann wrote:
 Hi
 
 On Sat, Jan 17, 2015 at 1:32 PM, Topi Miettinen toiwo...@gmail.com wrote:
 On 01/17/15 12:19, David Herrmann wrote:
 Hi

 On Sat, Jan 17, 2015 at 1:10 PM, Topi Miettinen toiwo...@gmail.com wrote:
 On 01/17/15 12:03, David Herrmann wrote:
 Hi

 On Sat, Jan 17, 2015 at 1:01 PM, Topi Miettinen toiwo...@gmail.com 
 wrote:
 On 01/17/15 11:38, David Herrmann wrote:
 Hi

 On Sat, Jan 17, 2015 at 12:28 PM, Topi Miettinen toiwo...@gmail.com 
 wrote:
 On my computer, the minimum brightness enforced by clamping in
 backlight is too bright.

 How can 5% of the backlight be too bright? Can you give some
 information on your backlight device? (type, max_brightness,
 actual_brightness and so on).

 Well, my eyes start to hurt with level 1, 0 is OK. Max_brightness is 9,
 actual_brightness always matches brightness. This is cheap old Acer
 Aspire 8530 laptop with Mobility Radeon HD 3200, I don't know beyond
 that what handles backlight.

 Which backlight driver is active? acpi? Or the native radeon driver?

 The device path is
 /sys/devices/pci\:00/\:00\:01.0/\:01\:05.0/backlight/acpi_video0/brightness,
 does that mean acpi?

 The problem here is, there're 2 types of backlight drivers in the kernel:

 1) backlight==0 is interpreted as 'off'
 2) backlight==0 is interpreted as 'lowest level that is not off'

 There is a second switch called 'bl_power' which allows to actually
 power the backlight off or on. This works on all devices the same way.
 However, if we want to figure out the lowest backlight level, we
 really cannot use '0' as we have no idea how the driver will interpret
 it.

 I see. Here, setting bl_power to 0 does nothing,
 
 Sure, if the hardware does not support power-down, it will not work.
 

 We discussed this at the last XDC but haven't really come to a
 conclusion. This is definitely a kernel bug, as user-space has no
 chance of setting a backlight generically to lowest level. If there
 were a property called min_brightness that exposes the minimal
 brightness level supported (which is not 'off'), we could parse it.
 However, we don't want to write per-driver workarounds in userspace if
 kernel people could just fix it.

 Conclusion: Lets try getting kernel backlight people to solve this mess.

 That may be the proper long term path, but because there's already a
 clamping workaround which does not do the right thing for all hardware,
 this override would be useful for such cases until the kernel is fixed.
 After the kernel is fixed, the clamping (along this override) should not
 be applied anymore.
 
 No, we still need clamping! In case your hardware has 256 brightness
 levels, we really don't want a brightness of 1 as it would still be
 far too dark.

How could you know that? The display can be too bright for the poor user
even at 4%.

-Topi

 
 Thanks
 David
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] backlight: let the administrator override clamping

2015-01-17 Thread Topi Miettinen
On 01/17/15 12:48, David Herrmann wrote:
 Hi
 
 On Sat, Jan 17, 2015 at 1:41 PM, Topi Miettinen toiwo...@gmail.com wrote:
 On 01/17/15 12:34, David Herrmann wrote:
 No, we still need clamping! In case your hardware has 256 brightness
 levels, we really don't want a brightness of 1 as it would still be
 far too dark.

 How could you know that? The display can be too bright for the poor user
 even at 4%.
 
 If the brightness scale is linear, I don't see how 5% can be too
 bright. We write defaults that shall work for everyone.

You are lucky not to have my eyes or hardware. It does not work for me.

 You can
 easily disable our helpers and employ your own. But we cant provide
 configuration hooks for each and every obscure use-case. We want
 people to disable the defaults and write their own, if they don't like
 the defaults.

I understand that you don't want additional knobs to be maintained, but
in this case it's not possible to override the clamping logic other way.
I can of course disable the backlight entirely but then I would miss the
save/restore functionality.

How about splitting the logic into two? The other helper would only
perform save and restore and the other would decide if the values are
reasonable for the current kernel version and hardware of the day?

 
 In your case, it's an outright bug, though. And we want bugs to be
 fixed, instead of workarounds if developers are too lazy to fix their
 stuff.

I fully agree, for me the bug is in clamping logic and the fact that it
can't be overridden or separated from loading.

 
 Thanks
 David
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH v3] Do not clear parent mount flags when setting up namespaces

2015-01-04 Thread Topi Miettinen
On 01/03/15 12:58, Topi Miettinen wrote:
 When setting up a namespace, flags like noexec, nosuid and nodev are
 cleared, so the mounts always have exec, suid, dev flags enabled.
 
 Copy parent directory mount flags when setting up a namespace and
 don't accidentally clear mount flags later.

Sorry, this version with statvfs() error checks breaks boot. I'm trying
to find out why.

 ---
  src/core/namespace.c | 12 ++--
  src/shared/util.c| 24 ++--
  src/shared/util.h|  2 ++
  3 files changed, 34 insertions(+), 4 deletions(-)
 
 diff --git a/src/core/namespace.c b/src/core/namespace.c
 index 4b8dbdd..6807e0c 100644
 --- a/src/core/namespace.c
 +++ b/src/core/namespace.c
 @@ -149,6 +149,7 @@ static int mount_dev(BindMount *m) {
  const char *d, *dev = NULL, *devpts = NULL, *devshm = NULL, 
 *devhugepages = NULL, *devmqueue = NULL, *devlog = NULL, *devptmx = NULL;
  _cleanup_umask_ mode_t u;
  int r;
 +unsigned long parent_flags;
  
  assert(m);
  
 @@ -159,7 +160,10 @@ static int mount_dev(BindMount *m) {
  
  dev = strappenda(temporary_mount, /dev);
  (void)mkdir(dev, 0755);
 -if (mount(tmpfs, dev, tmpfs, MS_NOSUID|MS_STRICTATIME, 
 mode=755)  0) {
 +r = get_mount_flags(/dev, parent_flags);
 +if (r  0)
 +goto fail;
 +if (mount(tmpfs, dev, tmpfs, 
 parent_flags|MS_NOSUID|MS_STRICTATIME, mode=755)  0) {
  r = -errno;
  goto fail;
  }
 @@ -272,6 +276,7 @@ static int mount_kdbus(BindMount *m) {
  char *busnode = NULL, *root;
  struct stat st;
  int r;
 +unsigned long parent_flags;
  
  assert(m);
  
 @@ -282,7 +287,10 @@ static int mount_kdbus(BindMount *m) {
  
  root = strappenda(temporary_mount, /kdbus);
  (void)mkdir(root, 0755);
 -if (mount(tmpfs, root, tmpfs, MS_NOSUID|MS_STRICTATIME, 
 mode=777)  0) {
 +r = get_mount_flags(/sys/fs/kdbus, parent_flags);
 +if (r  0)
 +goto fail;
 +if (mount(tmpfs, root, tmpfs, 
 parent_flags|MS_NOSUID|MS_STRICTATIME, mode=777)  0) {
  r = -errno;
  goto fail;
  }
 diff --git a/src/shared/util.c b/src/shared/util.c
 index dfaf7f7..b28213f 100644
 --- a/src/shared/util.c
 +++ b/src/shared/util.c
 @@ -61,6 +61,7 @@
  #include sys/personality.h
  #include sys/xattr.h
  #include libgen.h
 +#include sys/statvfs.h
  #undef basename
  
  #ifdef HAVE_SYS_AUXV_H
 @@ -6858,6 +6859,16 @@ int umount_recursive(const char *prefix, int flags) {
  return r ? r : n;
  }
  
 +int get_mount_flags(const char *path, unsigned long *flags) {
 +struct statvfs buf;
 +
 +if (statvfs(path, buf)  0)
 +return -errno;
 +
 +*flags = buf.f_flag;
 +return 0;
 +}
 +
  int bind_remount_recursive(const char *prefix, bool ro) {
  _cleanup_set_free_free_ Set *done = NULL;
  _cleanup_free_ char *cleaned = NULL;
 @@ -6892,6 +6903,7 @@ int bind_remount_recursive(const char *prefix, bool ro) 
 {
  _cleanup_set_free_free_ Set *todo = NULL;
  bool top_autofs = false;
  char *x;
 +unsigned long orig_flags;
  
  todo = set_new(string_hash_ops);
  if (!todo)
 @@ -6969,7 +6981,11 @@ int bind_remount_recursive(const char *prefix, bool 
 ro) {
  if (mount(cleaned, cleaned, NULL, MS_BIND|MS_REC, 
 NULL)  0)
  return -errno;
  
 -if (mount(NULL, prefix, NULL, MS_BIND|MS_REMOUNT|(ro 
 ? MS_RDONLY : 0), NULL)  0)
 +r = get_mount_flags(prefix, orig_flags);
 +if (r  0)
 +return r;
 +orig_flags = ~MS_RDONLY;
 +if (mount(NULL, prefix, NULL, 
 orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
  return -errno;
  
  x = strdup(cleaned);
 @@ -6989,7 +7005,11 @@ int bind_remount_recursive(const char *prefix, bool 
 ro) {
  if (r  0)
  return r;
  
 -if (mount(NULL, x, NULL, MS_BIND|MS_REMOUNT|(ro ? 
 MS_RDONLY : 0), NULL)  0) {
 +r = get_mount_flags(x, orig_flags);
 +if (r  0)
 +return r;
 +orig_flags = ~MS_RDONLY;
 +if (mount(NULL, x, NULL, 
 orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0) {
  
  /* Deal with mount points that are
   * obstructed by a later mount */
 diff --git a/src/shared/util.h b/src/shared/util.h
 index a131a3c..d5aa988 100644
 --- a/src/shared/util.h
 +++ b/src/shared/util.h

Re: [systemd-devel] [PATCH v3] Do not clear parent mount flags when setting up namespaces

2015-01-04 Thread Topi Miettinen
On 01/04/15 12:00, Djalal Harouni wrote:
 Hi Topi,
 
 On Sun, Jan 04, 2015 at 11:26:12AM +, Topi Miettinen wrote:
 On 01/03/15 12:58, Topi Miettinen wrote:
 When setting up a namespace, flags like noexec, nosuid and nodev are
 cleared, so the mounts always have exec, suid, dev flags enabled.

 Copy parent directory mount flags when setting up a namespace and
 don't accidentally clear mount flags later.

 Sorry, this version with statvfs() error checks breaks boot. I'm trying
 to find out why.
 Just to let you know in case you are running with kdbus, currently there
 are reports that kdbus+systemd git HEAD is broken, we plan to investigate
 it this week.

Thanks, but I don't have kdbus set up. The problem was with changes to
mount_dev() (by similarity probably also mount_kdbus() could be buggy).

I made a simpler patch which just avoids clobbering the mount flags when
remounting.

-Topi

 
 Thanks!
 
 ---
  src/core/namespace.c | 12 ++--
  src/shared/util.c| 24 ++--
  src/shared/util.h|  2 ++
  3 files changed, 34 insertions(+), 4 deletions(-)

 diff --git a/src/core/namespace.c b/src/core/namespace.c
 index 4b8dbdd..6807e0c 100644
 --- a/src/core/namespace.c
 +++ b/src/core/namespace.c
 @@ -149,6 +149,7 @@ static int mount_dev(BindMount *m) {
  const char *d, *dev = NULL, *devpts = NULL, *devshm = NULL, 
 *devhugepages = NULL, *devmqueue = NULL, *devlog = NULL, *devptmx = NULL;
  _cleanup_umask_ mode_t u;
  int r;
 +unsigned long parent_flags;
  
  assert(m);
  
 @@ -159,7 +160,10 @@ static int mount_dev(BindMount *m) {
  
  dev = strappenda(temporary_mount, /dev);
  (void)mkdir(dev, 0755);
 -if (mount(tmpfs, dev, tmpfs, MS_NOSUID|MS_STRICTATIME, 
 mode=755)  0) {
 +r = get_mount_flags(/dev, parent_flags);
 +if (r  0)
 +goto fail;
 +if (mount(tmpfs, dev, tmpfs, 
 parent_flags|MS_NOSUID|MS_STRICTATIME, mode=755)  0) {
  r = -errno;
  goto fail;
  }
 @@ -272,6 +276,7 @@ static int mount_kdbus(BindMount *m) {
  char *busnode = NULL, *root;
  struct stat st;
  int r;
 +unsigned long parent_flags;
  
  assert(m);
  
 @@ -282,7 +287,10 @@ static int mount_kdbus(BindMount *m) {
  
  root = strappenda(temporary_mount, /kdbus);
  (void)mkdir(root, 0755);
 -if (mount(tmpfs, root, tmpfs, MS_NOSUID|MS_STRICTATIME, 
 mode=777)  0) {
 +r = get_mount_flags(/sys/fs/kdbus, parent_flags);
 +if (r  0)
 +goto fail;
 +if (mount(tmpfs, root, tmpfs, 
 parent_flags|MS_NOSUID|MS_STRICTATIME, mode=777)  0) {
  r = -errno;
  goto fail;
  }
 diff --git a/src/shared/util.c b/src/shared/util.c
 index dfaf7f7..b28213f 100644
 --- a/src/shared/util.c
 +++ b/src/shared/util.c
 @@ -61,6 +61,7 @@
  #include sys/personality.h
  #include sys/xattr.h
  #include libgen.h
 +#include sys/statvfs.h
  #undef basename
  
  #ifdef HAVE_SYS_AUXV_H
 @@ -6858,6 +6859,16 @@ int umount_recursive(const char *prefix, int flags) {
  return r ? r : n;
  }
  
 +int get_mount_flags(const char *path, unsigned long *flags) {
 +struct statvfs buf;
 +
 +if (statvfs(path, buf)  0)
 +return -errno;
 +
 +*flags = buf.f_flag;
 +return 0;
 +}
 +
  int bind_remount_recursive(const char *prefix, bool ro) {
  _cleanup_set_free_free_ Set *done = NULL;
  _cleanup_free_ char *cleaned = NULL;
 @@ -6892,6 +6903,7 @@ int bind_remount_recursive(const char *prefix, bool 
 ro) {
  _cleanup_set_free_free_ Set *todo = NULL;
  bool top_autofs = false;
  char *x;
 +unsigned long orig_flags;
  
  todo = set_new(string_hash_ops);
  if (!todo)
 @@ -6969,7 +6981,11 @@ int bind_remount_recursive(const char *prefix, bool 
 ro) {
  if (mount(cleaned, cleaned, NULL, MS_BIND|MS_REC, 
 NULL)  0)
  return -errno;
  
 -if (mount(NULL, prefix, NULL, 
 MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
 +r = get_mount_flags(prefix, orig_flags);
 +if (r  0)
 +return r;
 +orig_flags = ~MS_RDONLY;
 +if (mount(NULL, prefix, NULL, 
 orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
  return -errno;
  
  x = strdup(cleaned);
 @@ -6989,7 +7005,11 @@ int bind_remount_recursive(const char *prefix, bool 
 ro) {
  if (r  0)
  return r;
  
 -if (mount(NULL, x, NULL, MS_BIND|MS_REMOUNT|(ro ? 
 MS_RDONLY : 0), NULL)  0) {
 +r = get_mount_flags(x

[systemd-devel] [PATCH v4] Do not clear parent mount flags when setting up namespaces

2015-01-04 Thread Topi Miettinen
When setting up a namespace, mount flags like noexec, nosuid and
nodev are cleared, so the mounts always have exec, suid and dev
flags enabled.

Copy source directory mount flags to target mount when remounting
the bind mounts.
---
 src/shared/util.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/shared/util.c b/src/shared/util.c
index dfaf7f7..8d4e91f 100644
--- a/src/shared/util.c
+++ b/src/shared/util.c
@@ -61,6 +61,7 @@
 #include sys/personality.h
 #include sys/xattr.h
 #include libgen.h
+#include sys/statvfs.h
 #undef basename
 
 #ifdef HAVE_SYS_AUXV_H
@@ -6858,6 +6859,15 @@ int umount_recursive(const char *prefix, int flags) {
 return r ? r : n;
 }
 
+static int get_mount_flags(const char *path, unsigned long *flags) {
+struct statvfs buf;
+
+if (statvfs(path, buf)  0)
+return -errno;
+*flags = buf.f_flag;
+return 0;
+}
+
 int bind_remount_recursive(const char *prefix, bool ro) {
 _cleanup_set_free_free_ Set *done = NULL;
 _cleanup_free_ char *cleaned = NULL;
@@ -6892,6 +6902,7 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 _cleanup_set_free_free_ Set *todo = NULL;
 bool top_autofs = false;
 char *x;
+unsigned long orig_flags;
 
 todo = set_new(string_hash_ops);
 if (!todo)
@@ -6969,7 +6980,11 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 if (mount(cleaned, cleaned, NULL, MS_BIND|MS_REC, 
NULL)  0)
 return -errno;
 
-if (mount(NULL, prefix, NULL, MS_BIND|MS_REMOUNT|(ro ? 
MS_RDONLY : 0), NULL)  0)
+r = get_mount_flags(prefix, orig_flags);
+if (r  0)
+return r;
+orig_flags = ~MS_RDONLY;
+if (mount(NULL, prefix, NULL, 
orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
 return -errno;
 
 x = strdup(cleaned);
@@ -6989,7 +7004,11 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 if (r  0)
 return r;
 
-if (mount(NULL, x, NULL, MS_BIND|MS_REMOUNT|(ro ? 
MS_RDONLY : 0), NULL)  0) {
+r = get_mount_flags(x, orig_flags);
+if (r  0)
+return r;
+orig_flags = ~MS_RDONLY;
+if (mount(NULL, x, NULL, 
orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0) {
 
 /* Deal with mount points that are
  * obstructed by a later mount */
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH v2] Do not clear parent mount flags when setting up namespaces

2015-01-03 Thread Topi Miettinen
On 01/02/15 23:53, Djalal Harouni wrote:
 Hi,
 
 On Thu, Jan 01, 2015 at 10:36:39PM +0200, Topi Miettinen wrote:
 Copy parent directory mount flags when setting up a namespace and
 don't accidentally clear mount flags later.
 As noted by Colin in the other email, there should be a git log message
 for why the change.
 
 Yes thank you, I see that in one of the replies of v1 of the patch you
 say why, so just perhaps use it in the commit log and code comment ?
 
 
 ---
  src/core/namespace.c |  4 ++--
  src/shared/util.c| 19 +--
  src/shared/util.h|  2 ++
  3 files changed, 21 insertions(+), 4 deletions(-)

 diff --git a/src/core/namespace.c b/src/core/namespace.c
 index 4b8dbdd..6859b6a 100644
 --- a/src/core/namespace.c
 +++ b/src/core/namespace.c
 @@ -159,7 +159,7 @@ static int mount_dev(BindMount *m) {
  
  dev = strappenda(temporary_mount, /dev);
  (void)mkdir(dev, 0755);
 -if (mount(tmpfs, dev, tmpfs, MS_NOSUID|MS_STRICTATIME, 
 mode=755)  0) {
 +if (mount(tmpfs, dev, tmpfs, 
 get_mount_flags(/dev)|MS_NOSUID|MS_STRICTATIME, mode=755)  0) {
 There is no need for this function to be a parameter
 
 
  r = -errno;
  goto fail;
  }
 @@ -282,7 +282,7 @@ static int mount_kdbus(BindMount *m) {
  
  root = strappenda(temporary_mount, /kdbus);
  (void)mkdir(root, 0755);
 -if (mount(tmpfs, root, tmpfs, MS_NOSUID|MS_STRICTATIME, 
 mode=777)  0) {
 +if (mount(tmpfs, root, tmpfs, 
 get_mount_flags(/sys/fs/kdbus)|MS_NOSUID|MS_STRICTATIME, mode=777)  0) {
  r = -errno;
  goto fail;
  }
 diff --git a/src/shared/util.c b/src/shared/util.c
 index dfaf7f7..8ff5073 100644
 --- a/src/shared/util.c
 +++ b/src/shared/util.c
 @@ -61,6 +61,7 @@
  #include sys/personality.h
  #include sys/xattr.h
  #include libgen.h
 +#include sys/statvfs.h
  #undef basename
  
  #ifdef HAVE_SYS_AUXV_H
 @@ -6858,6 +6859,15 @@ int umount_recursive(const char *prefix, int flags) {
  return r ? r : n;
  }
  
 +unsigned long get_mount_flags(const char *path) {
 +struct statvfs buf;
 +
 +if (statvfs(path, buf)  0)
 +return 0;
 IMO here it should return an errno since this is a helper. In that case
 perhaps just open code the statvfs() or improve the helper ?

I'll make it return errno, for general use it's of course good to know
if there was a problem.

 
 Thanks!
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH v2] Do not clear parent mount flags when setting up namespaces

2015-01-03 Thread Topi Miettinen
On 01/02/15 21:49, Colin Walters wrote:
 On Thu, Jan 1, 2015, at 03:36 PM, Topi Miettinen wrote:
 Copy parent directory mount flags when setting up a namespace and
 don't accidentally clear mount flags later.
 
 I think unless they're obvious, git commits should at least have a brief 
 rationale for *why* you're making the change, not just *what* the change is.  
 That way someone later editing the code has an idea what they might break if 
 they changed it.
 
 Is it something like mounting the tmpfs with options like size=?

It looks like fs specific options do not change when doing the bind
mount. The options that do get cleared unless copied are nosuid, nodev
and noexec flags, maybe also MS_SYNCHRONOUS, MS_POSIXACL etc.

 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH v3] Do not clear parent mount flags when setting up namespaces

2015-01-03 Thread Topi Miettinen
When setting up a namespace, flags like noexec, nosuid and nodev are
cleared, so the mounts always have exec, suid, dev flags enabled.

Copy parent directory mount flags when setting up a namespace and
don't accidentally clear mount flags later.
---
 src/core/namespace.c | 12 ++--
 src/shared/util.c| 24 ++--
 src/shared/util.h|  2 ++
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/src/core/namespace.c b/src/core/namespace.c
index 4b8dbdd..6807e0c 100644
--- a/src/core/namespace.c
+++ b/src/core/namespace.c
@@ -149,6 +149,7 @@ static int mount_dev(BindMount *m) {
 const char *d, *dev = NULL, *devpts = NULL, *devshm = NULL, 
*devhugepages = NULL, *devmqueue = NULL, *devlog = NULL, *devptmx = NULL;
 _cleanup_umask_ mode_t u;
 int r;
+unsigned long parent_flags;
 
 assert(m);
 
@@ -159,7 +160,10 @@ static int mount_dev(BindMount *m) {
 
 dev = strappenda(temporary_mount, /dev);
 (void)mkdir(dev, 0755);
-if (mount(tmpfs, dev, tmpfs, MS_NOSUID|MS_STRICTATIME, mode=755) 
 0) {
+r = get_mount_flags(/dev, parent_flags);
+if (r  0)
+goto fail;
+if (mount(tmpfs, dev, tmpfs, 
parent_flags|MS_NOSUID|MS_STRICTATIME, mode=755)  0) {
 r = -errno;
 goto fail;
 }
@@ -272,6 +276,7 @@ static int mount_kdbus(BindMount *m) {
 char *busnode = NULL, *root;
 struct stat st;
 int r;
+unsigned long parent_flags;
 
 assert(m);
 
@@ -282,7 +287,10 @@ static int mount_kdbus(BindMount *m) {
 
 root = strappenda(temporary_mount, /kdbus);
 (void)mkdir(root, 0755);
-if (mount(tmpfs, root, tmpfs, MS_NOSUID|MS_STRICTATIME, 
mode=777)  0) {
+r = get_mount_flags(/sys/fs/kdbus, parent_flags);
+if (r  0)
+goto fail;
+if (mount(tmpfs, root, tmpfs, 
parent_flags|MS_NOSUID|MS_STRICTATIME, mode=777)  0) {
 r = -errno;
 goto fail;
 }
diff --git a/src/shared/util.c b/src/shared/util.c
index dfaf7f7..b28213f 100644
--- a/src/shared/util.c
+++ b/src/shared/util.c
@@ -61,6 +61,7 @@
 #include sys/personality.h
 #include sys/xattr.h
 #include libgen.h
+#include sys/statvfs.h
 #undef basename
 
 #ifdef HAVE_SYS_AUXV_H
@@ -6858,6 +6859,16 @@ int umount_recursive(const char *prefix, int flags) {
 return r ? r : n;
 }
 
+int get_mount_flags(const char *path, unsigned long *flags) {
+struct statvfs buf;
+
+if (statvfs(path, buf)  0)
+return -errno;
+
+*flags = buf.f_flag;
+return 0;
+}
+
 int bind_remount_recursive(const char *prefix, bool ro) {
 _cleanup_set_free_free_ Set *done = NULL;
 _cleanup_free_ char *cleaned = NULL;
@@ -6892,6 +6903,7 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 _cleanup_set_free_free_ Set *todo = NULL;
 bool top_autofs = false;
 char *x;
+unsigned long orig_flags;
 
 todo = set_new(string_hash_ops);
 if (!todo)
@@ -6969,7 +6981,11 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 if (mount(cleaned, cleaned, NULL, MS_BIND|MS_REC, 
NULL)  0)
 return -errno;
 
-if (mount(NULL, prefix, NULL, MS_BIND|MS_REMOUNT|(ro ? 
MS_RDONLY : 0), NULL)  0)
+r = get_mount_flags(prefix, orig_flags);
+if (r  0)
+return r;
+orig_flags = ~MS_RDONLY;
+if (mount(NULL, prefix, NULL, 
orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
 return -errno;
 
 x = strdup(cleaned);
@@ -6989,7 +7005,11 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 if (r  0)
 return r;
 
-if (mount(NULL, x, NULL, MS_BIND|MS_REMOUNT|(ro ? 
MS_RDONLY : 0), NULL)  0) {
+r = get_mount_flags(x, orig_flags);
+if (r  0)
+return r;
+orig_flags = ~MS_RDONLY;
+if (mount(NULL, x, NULL, 
orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0) {
 
 /* Deal with mount points that are
  * obstructed by a later mount */
diff --git a/src/shared/util.h b/src/shared/util.h
index a131a3c..d5aa988 100644
--- a/src/shared/util.h
+++ b/src/shared/util.h
@@ -1021,6 +1021,8 @@ union file_handle_union {
 
 int update_reboot_param_file(const char *param);
 
+int get_mount_flags(const char *path, unsigned long *flags);
+
 int umount_recursive(const char *target, int flags);
 
 int bind_remount_recursive(const char *prefix, bool 

[systemd-devel] [PATCH] Do not clear parent mount flags when setting up namespaces

2015-01-01 Thread Topi Miettinen
Copy parent directory mount flags when setting up a namespace and
don't accidentally clear mount flags later.

Signed-off-by: Topi Miettinen toiwo...@gmail.com
---
 src/core/namespace.c |  4 ++--
 src/shared/util.c| 20 ++--
 src/shared/util.h|  2 ++
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/core/namespace.c b/src/core/namespace.c
index 5b408e0..400bc50 100644
--- a/src/core/namespace.c
+++ b/src/core/namespace.c
@@ -159,7 +159,7 @@ static int mount_dev(BindMount *m) {
 
 dev = strappenda(temporary_mount, /dev);
 (void)mkdir(dev, 0755);
-if (mount(tmpfs, dev, tmpfs, MS_NOSUID|MS_STRICTATIME, mode=755) 
 0) {
+if (mount(tmpfs, dev, tmpfs, 
get_mount_flags(/dev)|MS_NOSUID|MS_STRICTATIME, mode=755)  0) {
 r = -errno;
 goto fail;
 }
@@ -282,7 +282,7 @@ static int mount_kdbus(BindMount *m) {
 
 root = strappenda(temporary_mount, /kdbus);
 (void)mkdir(root, 0755);
-if (mount(tmpfs, root, tmpfs, MS_NOSUID|MS_STRICTATIME, 
mode=777)  0) {
+if (mount(tmpfs, root, tmpfs, 
get_mount_flags(/kdbus)|MS_NOSUID|MS_STRICTATIME, mode=777)  0) {
 r = -errno;
 goto fail;
 }
diff --git a/src/shared/util.c b/src/shared/util.c
index dfaf7f7..31fbb68 100644
--- a/src/shared/util.c
+++ b/src/shared/util.c
@@ -61,6 +61,7 @@
 #include sys/personality.h
 #include sys/xattr.h
 #include libgen.h
+#include sys/statvfs.h
 #undef basename
 
 #ifdef HAVE_SYS_AUXV_H
@@ -6858,6 +6859,16 @@ int umount_recursive(const char *prefix, int flags) {
 return r ? r : n;
 }
 
+unsigned long get_mount_flags(const char *path)
+{
+struct statvfs buf;
+
+if (statvfs(path, buf)  0)
+return 0;
+
+return buf.f_flag;
+}
+
 int bind_remount_recursive(const char *prefix, bool ro) {
 _cleanup_set_free_free_ Set *done = NULL;
 _cleanup_free_ char *cleaned = NULL;
@@ -6892,6 +6903,7 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 _cleanup_set_free_free_ Set *todo = NULL;
 bool top_autofs = false;
 char *x;
+unsigned long orig_flags;
 
 todo = set_new(string_hash_ops);
 if (!todo)
@@ -6969,7 +6981,9 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 if (mount(cleaned, cleaned, NULL, MS_BIND|MS_REC, 
NULL)  0)
 return -errno;
 
-if (mount(NULL, prefix, NULL, MS_BIND|MS_REMOUNT|(ro ? 
MS_RDONLY : 0), NULL)  0)
+orig_flags = get_mount_flags(prefix);
+orig_flags = ~MS_RDONLY;
+if (mount(NULL, prefix, NULL, 
orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
 return -errno;
 
 x = strdup(cleaned);
@@ -6989,7 +7003,9 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 if (r  0)
 return r;
 
-if (mount(NULL, x, NULL, MS_BIND|MS_REMOUNT|(ro ? 
MS_RDONLY : 0), NULL)  0) {
+orig_flags = get_mount_flags(x);
+orig_flags = ~MS_RDONLY;
+if (mount(NULL, x, NULL, 
orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0) {
 
 /* Deal with mount points that are
  * obstructed by a later mount */
diff --git a/src/shared/util.h b/src/shared/util.h
index a131a3c..4b3070a 100644
--- a/src/shared/util.h
+++ b/src/shared/util.h
@@ -1021,6 +1021,8 @@ union file_handle_union {
 
 int update_reboot_param_file(const char *param);
 
+unsigned long get_mount_flags(const char *path);
+
 int umount_recursive(const char *target, int flags);
 
 int bind_remount_recursive(const char *prefix, bool ro);
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] Type of mount(2) flags is unsigned long

2015-01-01 Thread Topi Miettinen
Signed-off-by: Topi Miettinen toiwo...@gmail.com
---
 src/core/namespace.c | 2 +-
 src/core/namespace.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/core/namespace.c b/src/core/namespace.c
index 400bc50..19a7590 100644
--- a/src/core/namespace.c
+++ b/src/core/namespace.c
@@ -422,7 +422,7 @@ int setup_namespace(
 bool private_dev,
 ProtectHome protect_home,
 ProtectSystem protect_system,
-unsigned mount_flags) {
+unsigned long mount_flags) {
 
 BindMount *m, *mounts = NULL;
 unsigned n;
diff --git a/src/core/namespace.h b/src/core/namespace.h
index 1f9d067..42b92e7 100644
--- a/src/core/namespace.h
+++ b/src/core/namespace.h
@@ -50,7 +50,7 @@ int setup_namespace(char **read_write_dirs,
 bool private_dev,
 ProtectHome protect_home,
 ProtectSystem protect_system,
-unsigned mount_flags);
+unsigned long mount_flags);
 
 int setup_tmp_dirs(const char *id,
   char **tmp_dir,
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Do not clear parent mount flags when setting up namespaces

2015-01-01 Thread Topi Miettinen
On 01/01/15 14:49, Topi Miettinen wrote:
 Copy parent directory mount flags when setting up a namespace and
 don't accidentally clear mount flags later.

The problem here is that flags noexec, nosuid and nodev are cleared, so
the mounts always have exec, suid, dev flags enabled. With the patch
applied, namespace mounts keep what parent had, for example:

# grep /etc /proc/`pidof acpid`/mounts
/dev/sdb1 /etc ext4
ro,nosuid,nodev,noatime,discard,errors=remount-ro,data=ordered 0 0

-Topi Miettinen

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Do not clear parent mount flags when setting up namespaces

2015-01-01 Thread Topi Miettinen
On 01/01/15 18:08, Dave Reisner wrote:
 On Thu, Jan 01, 2015 at 04:49:04PM +0200, Topi Miettinen wrote:
 Copy parent directory mount flags when setting up a namespace and
 don't accidentally clear mount flags later.

 Signed-off-by: Topi Miettinen toiwo...@gmail.com
 ---
  src/core/namespace.c |  4 ++--
  src/shared/util.c| 20 ++--
  src/shared/util.h|  2 ++
  3 files changed, 22 insertions(+), 4 deletions(-)

 diff --git a/src/core/namespace.c b/src/core/namespace.c
 index 5b408e0..400bc50 100644
 --- a/src/core/namespace.c
 +++ b/src/core/namespace.c
 @@ -159,7 +159,7 @@ static int mount_dev(BindMount *m) {
  
  dev = strappenda(temporary_mount, /dev);
  (void)mkdir(dev, 0755);
 -if (mount(tmpfs, dev, tmpfs, MS_NOSUID|MS_STRICTATIME, 
 mode=755)  0) {
 +if (mount(tmpfs, dev, tmpfs, 
 get_mount_flags(/dev)|MS_NOSUID|MS_STRICTATIME, mode=755)  0) {
  r = -errno;
  goto fail;
  }
 @@ -282,7 +282,7 @@ static int mount_kdbus(BindMount *m) {
  
  root = strappenda(temporary_mount, /kdbus);
  (void)mkdir(root, 0755);
 -if (mount(tmpfs, root, tmpfs, MS_NOSUID|MS_STRICTATIME, 
 mode=777)  0) {
 +if (mount(tmpfs, root, tmpfs, 
 get_mount_flags(/kdbus)|MS_NOSUID|MS_STRICTATIME, mode=777)  0) {
 
 Shouldn't this be /sys/fs/bus/kdbus? We certainly don't mount kdbusfs in
 the root...

Probably. I don't have kdbus here (sorry) and I don't quite get what the
function is supposed to do.

 
  r = -errno;
  goto fail;
  }
 diff --git a/src/shared/util.c b/src/shared/util.c
 index dfaf7f7..31fbb68 100644
 --- a/src/shared/util.c
 +++ b/src/shared/util.c
 @@ -61,6 +61,7 @@
  #include sys/personality.h
  #include sys/xattr.h
  #include libgen.h
 +#include sys/statvfs.h
  #undef basename
  
  #ifdef HAVE_SYS_AUXV_H
 @@ -6858,6 +6859,16 @@ int umount_recursive(const char *prefix, int flags) {
  return r ? r : n;
  }
  
 +unsigned long get_mount_flags(const char *path)
 +{
 
 Reigning style says to put the opening { at the end of the first line.

OK.

Thanks for the review.

 
 +struct statvfs buf;
 +
 +if (statvfs(path, buf)  0)
 +return 0;
 +
 +return buf.f_flag;
 +}
 +
  int bind_remount_recursive(const char *prefix, bool ro) {
  _cleanup_set_free_free_ Set *done = NULL;
  _cleanup_free_ char *cleaned = NULL;
 @@ -6892,6 +6903,7 @@ int bind_remount_recursive(const char *prefix, bool 
 ro) {
  _cleanup_set_free_free_ Set *todo = NULL;
  bool top_autofs = false;
  char *x;
 +unsigned long orig_flags;
  
  todo = set_new(string_hash_ops);
  if (!todo)
 @@ -6969,7 +6981,9 @@ int bind_remount_recursive(const char *prefix, bool 
 ro) {
  if (mount(cleaned, cleaned, NULL, MS_BIND|MS_REC, 
 NULL)  0)
  return -errno;
  
 -if (mount(NULL, prefix, NULL, 
 MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
 +orig_flags = get_mount_flags(prefix);
 +orig_flags = ~MS_RDONLY;
 +if (mount(NULL, prefix, NULL, 
 orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
  return -errno;
  
  x = strdup(cleaned);
 @@ -6989,7 +7003,9 @@ int bind_remount_recursive(const char *prefix, bool 
 ro) {
  if (r  0)
  return r;
  
 -if (mount(NULL, x, NULL, MS_BIND|MS_REMOUNT|(ro ? 
 MS_RDONLY : 0), NULL)  0) {
 +orig_flags = get_mount_flags(x);
 +orig_flags = ~MS_RDONLY;
 +if (mount(NULL, x, NULL, 
 orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0) {
  
  /* Deal with mount points that are
   * obstructed by a later mount */
 diff --git a/src/shared/util.h b/src/shared/util.h
 index a131a3c..4b3070a 100644
 --- a/src/shared/util.h
 +++ b/src/shared/util.h
 @@ -1021,6 +1021,8 @@ union file_handle_union {
  
  int update_reboot_param_file(const char *param);
  
 +unsigned long get_mount_flags(const char *path);
 +
  int umount_recursive(const char *target, int flags);
  
  int bind_remount_recursive(const char *prefix, bool ro);
 -- 
 2.1.4

 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH v2] Do not clear parent mount flags when setting up namespaces

2015-01-01 Thread Topi Miettinen
Copy parent directory mount flags when setting up a namespace and
don't accidentally clear mount flags later.
---
 src/core/namespace.c |  4 ++--
 src/shared/util.c| 19 +--
 src/shared/util.h|  2 ++
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/src/core/namespace.c b/src/core/namespace.c
index 4b8dbdd..6859b6a 100644
--- a/src/core/namespace.c
+++ b/src/core/namespace.c
@@ -159,7 +159,7 @@ static int mount_dev(BindMount *m) {
 
 dev = strappenda(temporary_mount, /dev);
 (void)mkdir(dev, 0755);
-if (mount(tmpfs, dev, tmpfs, MS_NOSUID|MS_STRICTATIME, mode=755) 
 0) {
+if (mount(tmpfs, dev, tmpfs, 
get_mount_flags(/dev)|MS_NOSUID|MS_STRICTATIME, mode=755)  0) {
 r = -errno;
 goto fail;
 }
@@ -282,7 +282,7 @@ static int mount_kdbus(BindMount *m) {
 
 root = strappenda(temporary_mount, /kdbus);
 (void)mkdir(root, 0755);
-if (mount(tmpfs, root, tmpfs, MS_NOSUID|MS_STRICTATIME, 
mode=777)  0) {
+if (mount(tmpfs, root, tmpfs, 
get_mount_flags(/sys/fs/kdbus)|MS_NOSUID|MS_STRICTATIME, mode=777)  0) {
 r = -errno;
 goto fail;
 }
diff --git a/src/shared/util.c b/src/shared/util.c
index dfaf7f7..8ff5073 100644
--- a/src/shared/util.c
+++ b/src/shared/util.c
@@ -61,6 +61,7 @@
 #include sys/personality.h
 #include sys/xattr.h
 #include libgen.h
+#include sys/statvfs.h
 #undef basename
 
 #ifdef HAVE_SYS_AUXV_H
@@ -6858,6 +6859,15 @@ int umount_recursive(const char *prefix, int flags) {
 return r ? r : n;
 }
 
+unsigned long get_mount_flags(const char *path) {
+struct statvfs buf;
+
+if (statvfs(path, buf)  0)
+return 0;
+
+return buf.f_flag;
+}
+
 int bind_remount_recursive(const char *prefix, bool ro) {
 _cleanup_set_free_free_ Set *done = NULL;
 _cleanup_free_ char *cleaned = NULL;
@@ -6892,6 +6902,7 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 _cleanup_set_free_free_ Set *todo = NULL;
 bool top_autofs = false;
 char *x;
+unsigned long orig_flags;
 
 todo = set_new(string_hash_ops);
 if (!todo)
@@ -6969,7 +6980,9 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 if (mount(cleaned, cleaned, NULL, MS_BIND|MS_REC, 
NULL)  0)
 return -errno;
 
-if (mount(NULL, prefix, NULL, MS_BIND|MS_REMOUNT|(ro ? 
MS_RDONLY : 0), NULL)  0)
+orig_flags = get_mount_flags(prefix);
+orig_flags = ~MS_RDONLY;
+if (mount(NULL, prefix, NULL, 
orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0)
 return -errno;
 
 x = strdup(cleaned);
@@ -6989,7 +7002,9 @@ int bind_remount_recursive(const char *prefix, bool ro) {
 if (r  0)
 return r;
 
-if (mount(NULL, x, NULL, MS_BIND|MS_REMOUNT|(ro ? 
MS_RDONLY : 0), NULL)  0) {
+orig_flags = get_mount_flags(x);
+orig_flags = ~MS_RDONLY;
+if (mount(NULL, x, NULL, 
orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL)  0) {
 
 /* Deal with mount points that are
  * obstructed by a later mount */
diff --git a/src/shared/util.h b/src/shared/util.h
index a131a3c..4b3070a 100644
--- a/src/shared/util.h
+++ b/src/shared/util.h
@@ -1021,6 +1021,8 @@ union file_handle_union {
 
 int update_reboot_param_file(const char *param);
 
+unsigned long get_mount_flags(const char *path);
+
 int umount_recursive(const char *target, int flags);
 
 int bind_remount_recursive(const char *prefix, bool ro);
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel