Re: [systemd-devel] Making /run respect Container Memory Limits

2024-09-23 Thread Demi Marie Obenour
On Mon, Sep 23, 2024 at 04:14:58PM +0200, Lennart Poettering wrote:
> On Mo, 23.09.24 11:58, Matthew Ife (matt...@ife.onl) wrote:
> 
> > > /run/ is only mounted by systemd if it is not pre-mounted already by
> > > the container manager. We generally assume the container manager does
> > > that (for example systemd-nspawn does it that way), already because
> > > /run/host/ is the mechanism to pass outside info/resources into the
> > > container in a systemd world, hence it really needs to be premounted.
> >
> > I think theabove is enough to know the right answer.
> > Fix the container manager to behave correctly. This feels like the most 
> > elegant approach.
> >
> > I didn't spot this when trying to understand the best approach to change 
> > things. Apologies.
> >
> > Note, you're right about how we do stupid things like disabling swap. Its 
> > not my call sadly!
> > Whilst I dont think the answer here is "adding swap will fix" there are a 
> > myriad other reasons to
> > have swap and it would at least elongate the cliff-edge we have with this 
> > problem otherwise.
> 
> Adding swap *will* fix the issue for you btw to a large degree.
> 
> By not having swap you make it impossible for tmpfs and anonymous
> memory to be paged out. You basically *create* an artificial OOM
> situation if any loads shows up, because you artifically minimize the
> amount of reclaimable pages: in most cases only mapped ELF binaries
> become reclaimable this way, so they will be constantly thrashed and
> everything goes to shit.
> 
> If you disable swap on a big server you are just misunderstanding how
> memory management works on Linux, and its pretty much your own
> fault. This might sound harsh, but it is how it is.

Does this mean that if something can't afford its working set to be
paged out for latency reasons, it _also_ can't afford its own code to be
paged out, and therefore should call mlockall() or otherwise explicitly
mlock() the code and data it is operating on, rather than expecting that
swap be disabled?

> Talk to whoever maintains these systems, and get them talk to some MM
> person and get educated about these things. There's a fundamental
> misunderstanding here how loaded systems need to be managed.
> 
> And if you then combine this with non-persistant journald, you are
> artificially amplifying the problem you artificially created for
> yourself, because you intentionally moved even more stuff that would
> normally be backed by disk into unreclaimable memory.
> 
> Lennart

Most (but not all) of the security concerns about swap can be mitigated
by using a dm-crypt volume with an ephemeral key.  Once the system
memory is wiped, the key is gone and with it any chance of accessing the
swapped-out data.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] namespace problem

2024-07-19 Thread Demi Marie Obenour
On Fri, Jul 19, 2024 at 12:08:58AM +0300, Mantas Mikulėnas wrote:
> On Thu, Jul 18, 2024, 15:43 Thomas Köller  wrote:
> 
> > Am 18.07.24 um 14:04 schrieb Mantas Mikulėnas:
> > > Yes, but namespace persistence actually relies on filesystem access –
> > > it's implemented as a bind-mount of the namespace file descriptor (onto
> > > /run/netns for the 'ip netns' tool), as otherwise namespaces only exist
> > > as long as processes that hold them.
> > >
> > > So if you have any service options that cause a new *mount* namespace to
> > > be created (preventing its filesystem mounts from being visible outside
> > > the unit), then it cannot pin persistent network namespaces.
> >
> > Quoting the manual page:
> > ProtectSystem=
> > Takes a boolean argument or the special values "full" or
> > "strict". If true, mounts the /usr/ and the boot loader directories
> > (/boot and /efi) read-only for processes invoked by this unit. If set
> > to "full", the /etc/ directory is mounted read-only, too.
> >
> > No mention of /var or /run.
> 
> 
> It still works this way whether it's mentioned or not. Once the unit's
> process is put in a new mount namespace, the entire `/` is marked private
> so that any mounts made underneath `/` remain visible only in that
> namespace. This equally affects the "read-only /etc" mount done by systemd
> itself as well as the /run/netns mount done by 'ip' or any other mounts
> done anywhere else.

This still ought to be mentioned in the documentation.  Not everyone
knows that persistent network namespaces involve bind mounts, and it is
much better for the caveat to be mentioned in the manual pages.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Question about the killing spree during the transition from the initrd to the root file system.

2024-07-09 Thread Demi Marie Obenour
On Tue, Jul 09, 2024 at 12:13:38PM +0200, Lennart Poettering wrote:
> On Mo, 08.07.24 15:57, Demi Marie Obenour (d...@invisiblethingslab.com) wrote:
> 
> > On Mon, Jul 08, 2024 at 01:16:56PM +0200, Lennart Poettering wrote:
> > > On Do, 04.07.24 12:44, Demi Marie Obenour (d...@invisiblethingslab.com) 
> > > wrote:
> > >
> > > > > No, these belong to your process, systemd couldn't really reach into
> > > > > your processes to close them, even if it wanted to.
> > > > >
> > > > > But do note that any files you keep open or mapped at the moment of 
> > > > > transition
> > > > > will remain pinned in memory, and cannot be released by the
> > > > > kernel. this means that even though during the tmpfs→host transition
> > > > > we generally destory the initrd's tmpfs' contents, the stuff you keep
> > > > > pinned will stick around.
> > > > >
> > > > > Generally, only special purpose software should be left around that
> > > > > way, if it is carefully written to handle this. For example it is not
> > > > > allowed to dlopen() anything (and hence no NSS either! No
> > > > > gethostbyname() or getpwnam() or so), because you'd otherwise end up
> > > > > with a weird mix of match of shared libs from the initrd and the host.
> > > >
> > > > If one does need to e.g. do DNS lookups in such a process, what is the
> > > > best way to do it?
> > >
> > > Well, simply don't write programs like that, of course.
> > >
> > > But if you really feel you must:
> > >
> > > If you need DNS, then do the lookups via your own statically linked
> > > DNS lib maybe?
> > >
> > > You could talk to resolved's varlink or D-Bus interfaces too, but I
> > > find this a bit icky, since you'll end up consuming services provided
> > > by the OS on the root fs, while you should instead provide services to
> > > that OS, but not consume them.
> > >
> > > If you want user/group name resolution: these are generally a resource
> > > manager by the host OS, hence you almos certainly are doing things
> > > wrong if you want to resolve them from your initrd service. You could
> > > talk to userdbd of course, via Varlink IPC, but the same applies as
> > > above: it's a bit icky if you consume services provided by the OS, if
> > > you are such a low-level daemon that must survive from initrd into
> > > host.
> > >
> > > In many ways: if you run like this you should consider yourself
> > > conceptually closer to kernelspace than to userspace. And hence, the
> > > same way as kernelspace generally doesn't resolve users or hostnames
> > > you shouldn't really either.
> >
> > What is the most common use-case for such daemons?  I thought that it
> > was for network-attached root filesystems.  Such a daemon might well
> > need to do DNS lookups.
> 
> As I said above, if you really can't avoid DNS, then do DNS, but do it
> yourself, i.e. add your own DNS client, and do not use OS services for
> this. i.e. no NSS that involves dlopen() on modules from the rootfs or
> talks to IPC services of the OS.

Would talking to systemd-resolved from the host OS do any damage?  I
agree that it isn't elegant, but neither is having completely separate
DNS configuration that can and will get out of sync.  At a minimum, is
it okay to listen for changes to systemd-resolved's configuration, so
that the daemon's own resolver can stay in sync?

I know that needing a call to systemd-resolved to e.g. resolve a page
fault from systemd-resolved itself will deadlock.  I'm assuming that the
daemon is designed so that this never happens.  And yes, it might be
better to go through a proxy process from the root FS in this case.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Question about the killing spree during the transition from the initrd to the root file system.

2024-07-08 Thread Demi Marie Obenour
On Mon, Jul 08, 2024 at 01:16:56PM +0200, Lennart Poettering wrote:
> On Do, 04.07.24 12:44, Demi Marie Obenour (d...@invisiblethingslab.com) wrote:
> 
> > > No, these belong to your process, systemd couldn't really reach into
> > > your processes to close them, even if it wanted to.
> > >
> > > But do note that any files you keep open or mapped at the moment of 
> > > transition
> > > will remain pinned in memory, and cannot be released by the
> > > kernel. this means that even though during the tmpfs→host transition
> > > we generally destory the initrd's tmpfs' contents, the stuff you keep
> > > pinned will stick around.
> > >
> > > Generally, only special purpose software should be left around that
> > > way, if it is carefully written to handle this. For example it is not
> > > allowed to dlopen() anything (and hence no NSS either! No
> > > gethostbyname() or getpwnam() or so), because you'd otherwise end up
> > > with a weird mix of match of shared libs from the initrd and the host.
> >
> > If one does need to e.g. do DNS lookups in such a process, what is the
> > best way to do it?
> 
> Well, simply don't write programs like that, of course.
> 
> But if you really feel you must:
> 
> If you need DNS, then do the lookups via your own statically linked
> DNS lib maybe?
> 
> You could talk to resolved's varlink or D-Bus interfaces too, but I
> find this a bit icky, since you'll end up consuming services provided
> by the OS on the root fs, while you should instead provide services to
> that OS, but not consume them.
> 
> If you want user/group name resolution: these are generally a resource
> manager by the host OS, hence you almos certainly are doing things
> wrong if you want to resolve them from your initrd service. You could
> talk to userdbd of course, via Varlink IPC, but the same applies as
> above: it's a bit icky if you consume services provided by the OS, if
> you are such a low-level daemon that must survive from initrd into
> host.
> 
> In many ways: if you run like this you should consider yourself
> conceptually closer to kernelspace than to userspace. And hence, the
> same way as kernelspace generally doesn't resolve users or hostnames
> you shouldn't really either.

What is the most common use-case for such daemons?  I thought that it
was for network-attached root filesystems.  Such a daemon might well
need to do DNS lookups.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Question about the killing spree during the transition from the initrd to the root file system.

2024-07-04 Thread Demi Marie Obenour
On Thu, Jul 04, 2024 at 03:18:04PM +0200, Lennart Poettering wrote:
> On Do, 04.07.24 11:24, chenruyi (A) (chenru...@huawei.com) wrote:
> 
> > Hi,
> >
> > I have some processes in my initrd needed to be excluded from the killing 
> > spree
> > during switch-root and needed to continue to run in the root file system. I 
> > read
> > the ROOT_STORAGE_DAEMONS.md and the source code of killall.c, and I've 
> > learned
> > that there are methods to exclude the processes from the killing spree, 
> > such as
> > setting `@` to `argv[0][0]`.
> >
> > However, I'm not sure if this is without potential consequences. For 
> > example, could
> > it be that even though my processes survive, some resources that the 
> > processes
> > depends on are discarded after switch-root, such as file
> > descriptors?
> 
> No, these belong to your process, systemd couldn't really reach into
> your processes to close them, even if it wanted to.
> 
> But do note that any files you keep open or mapped at the moment of transition
> will remain pinned in memory, and cannot be released by the
> kernel. this means that even though during the tmpfs→host transition
> we generally destory the initrd's tmpfs' contents, the stuff you keep
> pinned will stick around.
> 
> Generally, only special purpose software should be left around that
> way, if it is carefully written to handle this. For example it is not
> allowed to dlopen() anything (and hence no NSS either! No
> gethostbyname() or getpwnam() or so), because you'd otherwise end up
> with a weird mix of match of shared libs from the initrd and the host.

If one does need to e.g. do DNS lookups in such a process, what is the
best way to do it?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Please clarify osVersion in ELF package metadata

2024-06-18 Thread Demi Marie Obenour
On Tue, Jun 18, 2024 at 11:24:22AM +0200, Benjamin Drung wrote:
> On Mon, 2024-06-17 at 11:19 -0500, Greg Oliver wrote:
> > On Mon, Jun 17, 2024 at 10:38 AM Benjamin Drung  wrote:
> > > On Mon, 2024-06-17 at 14:54 +0100, Luca Boccassi wrote:
> > > > On Mon, 17 Jun 2024 at 14:45, Benjamin Drung  wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Ubuntu started to implement the ELF package metadata spec. It encodes
> > > > > the VERSION_ID from os-release in the osVersion field. Using 
> > > > > VERSION_ID
> > > > > was objected to because the version is only set in stone once the
> > > > > release is done. It could change during the development cycle. See
> > > > > https://lists.ubuntu.com/archives/ubuntu-devel/2024-June/043027.html
> > > > > and https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/2069599
> > > > > 
> > > > > The proposal is to use VERSION_CODENAME from os-release instead.
> > > > > 
> > > > > To me it is not clear enough what is the best approach regarding the
> > > > > spec https://systemd.io/ELF_PACKAGE_METADATA/ here.
> > > > > 
> > > > > The key description says "typically"? So could we just use
> > > > > VERSION_CODENAME for osVersion?
> > > > > 
> > > > > Or should be use a different key like osVersionCodename to allow 
> > > > > third-
> > > > > party users to still use VERSION_ID for osVersion? In that case
> > > > > osVersionCodename should probably added to the well-known keys.
> > > > > 
> > > > > What's your take on it?
> > > > 
> > > > Hi,
> > > > 
> > > > I replied on ubuntu-devel but it's moderated, so the message didn't
> > > > make it through and is waiting for approval.
> > > > 
> > > > The gist of it is that this is supposed to be machine readable, and be
> > > > what is commonly understood as the version, which for the next ubuntu
> > > > version would be 23.10.
> > > > 
> > > > Given it's sourced from os-release, which is sourced from base-files,
> > > > ideally you'd do an archive-wide rebuild once it is finalized (that
> > > > also gives you builds with newer compiler hardening and other
> > > > niceties). If that's not possible or not wanted, simply omit the
> > > > osVersion field. Parsers need to expect that to be optional, in order
> > > > to avoid breaking on rolling release distros like Arch that do not
> > > > have a version.
> > > 
> > > From that perspective Debian and Ubuntu are semi-rolling releases:
> > > Packages are copied over from one release to another. As long as there
> > > is no new upload happening for the package between two release, the
> > > identical binary package will be shipped. So osVersion would still be
> > > unchanged. So osVersion indicated which os version the package was
> > > introduced but not on which release it is running on. Do you suggest to
> > > omit osVersion due to that?
> > > 
> > > My initial question targets a different problem: The version number is
> > > finalized (set in stone) on release date. Ubuntu was released on time
> > > except for one case. In such case where we need more time and delay the
> > > release, we won't have time to start an archive wide rebuild of all
> > > package just to correct osVersion in the ELF objects. On the other hand
> > > the version codename is set in stone when the archive for that release
> > > is opened. That's why it was suggested to use the version codename
> > > instead of the version ID.
> > 
> > IMHO, a rolling release is just that - it is self explanatory.  Debian and
> > Ubuntu are definitely not that.  In your given scenario, the packages should
> > be rebuilt for the current OS Release with the metadata bumped even if it
> > is the same version o said package.  Also, you will definitely be bumping
> > the c libraries with each OS version bump, so you would always want to
> > re-compile them with the current libraries and keep them separate via the
> > OS release based repository directories - yes?  I think over-engineering
> > is going on here :) 
> 
> No, Debian and Ubuntu are much bigger than other distributions.
> Currently there are 38,579 source packages in Ubuntu. We will not
> rebuild them every six month for a new release. There will be new builds
> of the package in case it gets updated/changed or a used library
> transitions from one ABI to another.

How long (in terms of machine time) would be needed to rebuild the
world?  Fedora does do mass rebuilds for each release.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Systemd, cgrupsv2, cgrulesengd, and nftables

2024-06-14 Thread Demi Marie Obenour
On Fri, Jun 14, 2024 at 10:06:34AM +0200, Mikhail Morfikov wrote:
> On 13/06/2024 10.27 pm, Lennart Poettering wrote:
> > On Do, 13.06.24 21:38, Mikhail Morfikov (mmorfi...@gmail.com) wrote:
> > 
> > > I'm trying to make the 4 things (systemd, cgrupsv2, cgrulesengd, and 
> > > nftables)
> > > work together, but I think I'm missing something.
> > 
> > Is "cgrulesengd" interfering with the cgroup tree?
> > 
> > Sorry, but that's simply not supported. cgroupv2 has a single-writer
> > rule, i.e. every part of the tree has only a single writer, a single
> > manager. And you must delegate a subtree to other managers if a
> > different manager shall also manage cgroups.
> > 
> > Hence, if you have something that just takes systemd managed processes
> > and moves them elsewhere, it's simply not supported. Sorry, you voided
> > your warranty.
> > 
> > Lennart
> > 
> > --
> > Lennart Poettering, Berlin
> 
> I don't need any warranty, I need a way to make this work.

I don't know anything about cgrulesengd, but from your post it seems
that it relies on scanning all processes and moving them to cgroups
based on information about them.  This isn't compatible with systemd.
There are a few options that will work:

1. Change cgrulesengd to use systemd's D-Bus API to manage cgroups.
2. Run everything in a container that doesn't use systemd.
3. Stop using cgrulesengd, and instead use systemd units to define
   cgroups.  Then use other approaches (such as wrapper scripts) to
   ensure that programs are launched in the correct systemd units.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] PCR signing / enrolling on UKI and validation by systemd-cryptenroll

2024-05-30 Thread Demi Marie Obenour
On Thu, May 30, 2024 at 11:22:56PM +0200, Lennart Poettering wrote:
> On Do, 30.05.24 22:43, Lennart Poettering (lenn...@poettering.net) wrote:
> 
> > > What about combining two different secrets, such that _both_ must be
> > > accessible?  At a minimum, something like HASH(SECRET1||SECRET2) is
> > > guaranteed to be available if and only if both SECRET1 and SECRET2 are
> > > available.  This won't work with TPM-bound keys that are not accessible
> > > outside the TPM, but my understanding is that the most common cases
> > > (LUKS and fscrypt keys and systemd credentials) must be accessible in
> > > cleartext on the host _anyway_.  If the secret to be sealed is provided
> > > externally, then one can use symmetric encryption with a randomly
> > > generated key to have the same effect.
> >
> > Hmm, this is an interesting idea, I kinda like it. But I am not sure
> > how far this will get us, because I think even for FDE we eventually
> > want to store asymmetric keys, not symmetric ones (i.e. I think we
> > should start supporting things like TPM2+FIDO or TPM2+PKCS11 or
> > TPM2+ssh-agent where both devices operate in tandem, in a challenge
> > response model, not sure how far you get with that if we can only
> > protect symmetric keys)
> 
> Eh, I might have figured out a way how I can do this, somewhat
> inspired by this:
> 
> TPMs implement hierarchies of keys after all where each key is wrapped
> by its parent, and you can apparently nest things pretty liberally, to
> as many levels as one likes.
> 
> So here's what systemd's TPM2-based FDE does right now:
> 
> When enrolling: it ensures that a "storage root key" (SRK) exists on
> the TPM. It then loads the plaintext FDE encryption key as a symmetric
> key into the TPM, so that it is "wrapped" by the SRK. It then reads
> back the wrapped (i.e. encrypted) key (this is called "sealing") and
> writes that to the LUKS superblock. When unlocking we take that
> wrapped key, load it back into the TPM and then read back the
> plaintext key (this is called "unsealing"). Since the SRK is specific
> to the TPM only the TPM can give us access to our FDE key. This model
> is then enriched with TPM2 "extended policies" which we set while
> sealing and which tell the TPM to insist that during unsealing the
> PCRs are in a specific state.
> 
> So much so good. This allows us to define *one* extended policy for the
> FDE key. And as mentioned that's a problem for us, because we'd like
> to define *two* extended policies (i.e. the pcrlock one, and the
> signed PCR one). But if we take benefit of the fact we can wrap keys
> arbitrarily we can do it like this:
> 
> when enrolling: as before, take care of the SRK. But now generate
> another key, wrapped by the SRK and with our first policy built into
> it. And then seal the FDE key against that "intermediate" key, and
> build our 2nd policy into that sealing.
> 
> To unlock we then first have to load the intermediate key (which will
> just work) and then load the FDE key below it (which will require us
> to fulfill policy 1) and then the unseal the FDE key (which will
> require us to fulfill policy 2).
> 
> Unless I am missing something this should work and do exactly what I
> want: I can combine policies arbitrarily.

Does this require policies 1 and 2 to be fulfilled _at the same time_?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] PCR signing / enrolling on UKI and validation by systemd-cryptenroll

2024-05-30 Thread Demi Marie Obenour
On Thu, May 30, 2024 at 10:43:48PM +0200, Lennart Poettering wrote:
> On Mi, 29.05.24 14:48, Demi Marie Obenour (d...@invisiblethingslab.com) wrote:
> 
> > > > > (you can of course include PolicyAuthorizeNV in the policy you sign
> > > > > for PolicyAuthorize, but that doesn#t work, since we want to pin the
> > > > > local nvindex really, and allocate it localy, and the signer (i.e. the
> > > > > OS vendor) cannot possibly do that. Or you could include the
> > > > > PolicyAuthorize in the policy you store in the nvindex for
> > > > > PolicyAuthorizeNV use, but that feels much less interesting since it
> > > > > means the enforcement of the combination is subject to local,
> > > > > delegated policy choices instead of mandated by the policy of the
> > > > > actual object we want to protect)
> > >
> > > this here is where i discuss what you are saying ^^^
> > >
> > > so technically this works, but this means objects are effectively
> > > protected by local policy only. And whether to also protect by OS vendor
> > > policy is then a choice of the local policy, but not a choice of the
> > > original object's policy anymore. Or in other words: that shifts
> > > around who owns which part of the policy. Ideally we want that when I
> > > create a protected object in the TPM I can say: "to unlock this you
> > > *must* validate OS vendor policy *and* local pcrlock policy". But you
> > > cannot do that. You can only say "to unlick this you *must* validate
> > > local pcrlock policy", and then hope that that local policy also
> > > enforces validation via OS vendor policy.
> >
> > What about combining two different secrets, such that _both_ must be
> > accessible?  At a minimum, something like HASH(SECRET1||SECRET2) is
> > guaranteed to be available if and only if both SECRET1 and SECRET2 are
> > available.  This won't work with TPM-bound keys that are not accessible
> > outside the TPM, but my understanding is that the most common cases
> > (LUKS and fscrypt keys and systemd credentials) must be accessible in
> > cleartext on the host _anyway_.  If the secret to be sealed is provided
> > externally, then one can use symmetric encryption with a randomly
> > generated key to have the same effect.
> 
> Hmm, this is an interesting idea, I kinda like it. But I am not sure
> how far this will get us, because I think even for FDE we eventually
> want to store asymmetric keys, not symmetric ones (i.e. I think we
> should start supporting things like TPM2+FIDO or TPM2+PKCS11 or
> TPM2+ssh-agent where both devices operate in tandem, in a challenge
> response model, not sure how far you get with that if we can only
> protect symmetric keys)

How would TPM2+FIDO work?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] PCR signing / enrolling on UKI and validation by systemd-cryptenroll

2024-05-29 Thread Demi Marie Obenour
On Wed, May 29, 2024 at 04:54:13PM +0200, Lennart Poettering wrote:
> On Mi, 29.05.24 17:00, Andrei Borzenkov (arvidj...@gmail.com) wrote:
> 
> > If you use pcrlock for more flexibility it will change into
> >
> > PolicyPCR(PCR1, PCR2, ...)
> > PolicyAuthorize
> > PolicyPCR(PCR3, PCR4, ...)
> > PolicyOR(digest1, digest2, ...)
> > PolicyAuthorizeNV
> > Unseal
> 
> When you do this then the policy made up of the three expressions in
> the middle would have to be stored in the nvindex. Which you
> definitely can do, and this is exactly what I discussed below, see
> below:
> 
> > > (you can of course include PolicyAuthorizeNV in the policy you sign
> > > for PolicyAuthorize, but that doesn#t work, since we want to pin the
> > > local nvindex really, and allocate it localy, and the signer (i.e. the
> > > OS vendor) cannot possibly do that. Or you could include the
> > > PolicyAuthorize in the policy you store in the nvindex for
> > > PolicyAuthorizeNV use, but that feels much less interesting since it
> > > means the enforcement of the combination is subject to local,
> > > delegated policy choices instead of mandated by the policy of the
> > > actual object we want to protect)
> 
> this here is where i discuss what you are saying ^^^
> 
> so technically this works, but this means objects are effectively
> protected by local policy only. And whether to also protect by OS vendor
> policy is then a choice of the local policy, but not a choice of the
> original object's policy anymore. Or in other words: that shifts
> around who owns which part of the policy. Ideally we want that when I
> create a protected object in the TPM I can say: "to unlock this you
> *must* validate OS vendor policy *and* local pcrlock policy". But you
> cannot do that. You can only say "to unlick this you *must* validate
> local pcrlock policy", and then hope that that local policy also
> enforces validation via OS vendor policy.

What about combining two different secrets, such that _both_ must be
accessible?  At a minimum, something like HASH(SECRET1||SECRET2) is
guaranteed to be available if and only if both SECRET1 and SECRET2 are
available.  This won't work with TPM-bound keys that are not accessible
outside the TPM, but my understanding is that the most common cases
(LUKS and fscrypt keys and systemd credentials) must be accessible in
cleartext on the host _anyway_.  If the secret to be sealed is provided
externally, then one can use symmetric encryption with a randomly
generated key to have the same effect.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] PCR signing / enrolling on UKI and validation by systemd-cryptenroll

2024-05-29 Thread Demi Marie Obenour
On Wed, May 29, 2024 at 10:36:28AM +0200, Lennart Poettering wrote:
> On Di, 28.05.24 17:36, Demi Marie Obenour (d...@invisiblethingslab.com) wrote:
> 
> > > (you can of course include PolicyAuthorizeNV in the policy you sign
> > > for PolicyAuthorize, but that doesn#t work, since we want to pin the
> > > local nvindex really, and allocate it localy, and the signer (i.e. the
> > > OS vendor) cannot possibly do that. Or you could include the
> > > PolicyAuthorize in the policy you store in the nvindex for
> > > PolicyAuthorizeNV use, but that feels much less interesting since it
> > > means the enforcement of the combination is subject to local,
> > > delegated policy choices instead of mandated by the policy of the
> > > actual object we want to protect)
> >
> > Does this work in practice?  I agree that this is ugly, but "ugly" might
> > be better than "not working".
> 
> Well, it should work. I am still not ready to give up on finding a
> better solution to this. For example, I have some vague hopes that we
> can make TPM "tickets" work for this.
> 
> As I understand tickets would allow us to validate policies once,
> which would give us a "ticket" back for that that is valid for a
> specific time. Then we can bind the policies of other objects to the
> availibility of such valid tickets, and then combine two ticket
> validations that way.
> 
> Superficially that would do what we need. i.e. if I get one ticket for
> the signed PCR policy (i.e. for the PolicyAuthorize thing) and another
> ticket for the pcrlock policy (i.e. the PolcyAuhtorizeNV thing) then I
> can build a policy checking both tickets and be fine.
> 
> Except that things aren't that easy (well, the above isn't precisely
> "easy" either), because suddenly a time-out comes into play, and we
> lose this nice "fuse blowing" feature of PCRs: i.e. while we boot we
> measure the boot phase into PCR 11 after all, to ensure that secrets
> that shall only be possible to be unlocked in — let's say – the initrd
> cannot possibly be unlocked any later, because the PCR is "destroyed"
> via the later phase measurement. If we use tickets we could still
> unlock things till the end of the timeout, which we probably have to
> pick large because of differences of boot speeds, hence this
> compromises security quite a bit I'd say.
> 
> Hence, maybe tickets aren't the way to go, they bring complexity, they
> would make a pretty relevant feature of our policies go down the drain
> – even though they would combine the two relevant policies correctly.

What about inserting an explicit delay into the boot process until the
ticket expires?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] PCR signing / enrolling on UKI and validation by systemd-cryptenroll

2024-05-28 Thread Demi Marie Obenour
On Tue, May 28, 2024 at 09:55:36PM +0200, Lennart Poettering wrote:
> On Di, 28.05.24 21:21, Andrei Borzenkov (arvidj...@gmail.com) wrote:
> 
> > On 28.05.2024 17:49, Lennart Poettering wrote:
> > >
> > > systemd-cryptenroll supports pin, literal PCR, signed PCR — in any
> > > combination. (plus pcrlock, but that's currently cannot be combined
> > > with signed PCR, because afaics not expressible in the TPM policy 
> > > language).
> > >
> >
> > Why not? You can AND pcrlock with other policies just like currently literal
> > PCR is ANDed with signed PCR. You can even use signed PCR in pcrlock policy
> > - PolicyOR does not care what policies are combined, literal PCR (like is
> > done currently) or signed PCR. Or what semantic do you have in mind that
> > cannot be expressed?
> 
> pcrlock is ultimately a PolicyAuthorizeNV policy, and signed policies
> use PolicyAuthorize. Both of these policy items do not *extend* the
> policy so far enqueued, but *replace* it instead. (This is different
> from policies such as PolicyPCR or PolicyAuthValue and so on, which
> result in extension, i.e. "AND") Thus, there's not directly obvious
> way how you could combine them.
> 
> (you can of course include PolicyAuthorizeNV in the policy you sign
> for PolicyAuthorize, but that doesn#t work, since we want to pin the
> local nvindex really, and allocate it localy, and the signer (i.e. the
> OS vendor) cannot possibly do that. Or you could include the
> PolicyAuthorize in the policy you store in the nvindex for
> PolicyAuthorizeNV use, but that feels much less interesting since it
> means the enforcement of the combination is subject to local,
> delegated policy choices instead of mandated by the policy of the
> actual object we want to protect)

Does this work in practice?  I agree that this is ugly, but "ugly" might
be better than "not working".

> I have so far not found a nice way out of this problem. Seems to be a
> limitation of the TPM policy language.
> 
> Lennart

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Submitting a service activation to remote mounts success

2024-02-06 Thread Demi Marie Obenour
On Tue, Feb 06, 2024 at 05:06:02PM +0100, Silvio Knizek wrote:
> Am Dienstag, dem 06.02.2024 um 16:15 +0100 schrieb Thomas HUMMEL:
> > Hello,
> >=20
> > I'm using systemd-239-74 on RHEL 8.8 EUS.
> >=20
> > I was wondering if one can express the following :
> >=20
> > start some service *only and only if/when* all remote mounts (ex: nfs,=
> =20
> > some parallel fs) has *succeeded*, taking into account it may take some=
> =20
> > time for some mount (some fs clients just live curl | sh themselves at=
> =20
> > start !) to finish (which seems to exlude usage of=20
> > AssertPathIsMountPoint for instance, as it would not wait, or would it ?)
> >=20
> > I have no auto option in the fstab for those fs and they use the _netdev=
> =20
> > option
> >=20
> > Obvisouly I could statically list all the mounts units as an ordering=20
> > dependency but this is not what I was looking for as there are namy (and=
> =20
> > I'm not even sure - see below - it it would be enough)
> >=20
> > Exploring this question I stumbled upon the following points :
> >=20
> > my understanding is that:
> >=20
> > 1. remote-fs.target special target is pulled in by multi-user.target and=
> =20
> > is added by systemd-fstab-generator as a Before=3D ordering dep to all=
> =20
> > remote .mount units
> >=20
> > -> I also see a remote-fs.target has a Requires=3D=20
> > activation dep : I probably missed it in the doc but I don't see this=20
> > listed in neither implicit nor default dep : where does it come from ?
> >=20
> > 2. Before=3D/After=3D refer, in the case of service units, to when the un=
> it=20
> > has "finished starting up", this being defined by "when it returns=20
> > failed or success", which is dependent of the Type=3D of the service
> >=20
> > Is this understanding correct ?
> >=20
> > But when the unit is of type mount : what's the semantic of Before/After=
> =20
> > ? (I don't think I saw it in the doc neither)
> >=20
> > What's the meaning/use of Type=3Dnone in a .mount unit ?
> >=20
> > My experience is that the mount may fail and remote-fs.target will still=
> =20
> > be reached, even if one replace Requires with BindsTo, correct ?
> >=20
> > So success or failure of the mount process does not seem to be involved=
> =20
> > in the ordering dep, or does it ?
> >=20
> > Thanks for your help
> 
> Hi Thomas,
> 
> RequiresMountsFor=3D should be your friend. It just takes a space-
> separated list of paths and does all the other stuff by itself.
> 
> Another options would be to switch to x-systemd.automount in fstab for
> the network shares, so they will be mounted on first access, not
> necessary during early boot when there is no network.

FYI, it looks like your mailer used quoted-printable encoding, but
didn’t set the appropriate headers to indicate that this encoding is in
use.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Bump: Testing LogFilterPatterns= on user-level services

2024-01-26 Thread Demi Marie Obenour
On Fri, Jan 26, 2024 at 09:11:24AM +0100, Lennart Poettering wrote:
> On Do, 25.01.24 22:29, Farblos (akfkqu.9df...@vodafonemail.de) wrote:
> 
> > Hi.
> >
> > I sent below mail some week ago, Barry's reply left me unsure as to
> > whether this would be a bug or not.  I still tend do assume that I'm
> > "doing something wrong".
> 
> This is currently not supported. The filters are communicated by the
> service manager to journald via xattrs on the cgroups, and journald
> will only consider those for cgroups owned by root, i.e. not on
> cgroups delegated to unpriv users like this done for systemd --user
> instances.
> 
> Interepreting arbitrary regexes configured by unpriv code in priv code
> comes at some risk,. becose afair constructing them can come at O(2^n)
> time, i.e. a rogue regex could make use consume unbounded time on
> processing journal messages.

Which regex engine is used?  glibc’s engine is not safe for use with
untrusted input, but Rust’s is, so that might be an option in the
future.  It isn’t OOM-safe, though.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-12 Thread Demi Marie Obenour
On Tue, Dec 12, 2023 at 06:40:32PM +0100, Lennart Poettering wrote:
> On Mo, 11.12.23 12:48, Eric Curtin (ecur...@redhat.com) wrote:
> 
> > Although the nice thing about a storage-init like approach is there's
> > basically zero copies up front. What storage-init is trying to be, is
> > a tool to just call systemd storage things, without also inheriting
> > all the systemd stack.
> 
> Just to make this clear: using things like systemd-cryptsetup outside
> of the systemd stack is not going to work once you leave trivial
> setups. i.e. the TPM hookup involves multiple services these days, and
> it's not going to get any simpler. i.e. systemd-tpm2-setup,
> systemd-pcrextend, systemd-pcrlock and so on. I am sorry, but doing
> reasonable disk encryption with TPM involved means you either buy into
> the whole systemd offer (i.e. with the service manager) or you have to
> rewrite your own systemd.
> 
> But maybe I am misunderstanding what you are saying here.

I think a key factor here is that the initial suggestion was for
automotive use cases.  One can have a vastly simpler system if one is
willing to deliver hardware-specific images, rather than trying to have
a single image that supports many different hardware models.  Automotive
and other embedded systemd understandably do not want to pay for
complexity that they do not need, and which is present to support
features (such as supporting arbitrary hardware) they will never use.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: IPv6 Compliance for networkd

2023-12-11 Thread Demi Marie Obenour
On Mon, Dec 11, 2023 at 10:52:31PM +, Muggeridge, Matt wrote:
> 
> 
> > -Original Message-
> > From: Demi Marie Obenour 
> > Sent: Tuesday, December 12, 2023 7:14 AM
> > To: Muggeridge, Matt ; systemd-
> > de...@lists.freedesktop.org
> > Subject: Re: IPv6 Compliance for networkd
> > 
> > On Mon, Dec 11, 2023 at 07:14:27PM +, Muggeridge, Matt wrote:
> > > Hello, networkd developer community,
> > >
> > > I am hoping to rally support for making networkd IPv6 compliant and I'm 
> > > will
> > to help, but cannot do it alone. Is there any interest in making systemd-
> > networkd IPv6 compliant?
> > >
> > > There are many organizations (especially US Government) that mandate
> > IPv6 compliance (USGv6).  Products that are dependent on networkd cannot
> > be bid to these customers.
> > >
> > > How do I engage with the right people in the developer community?
> > >
> > > Thanks,
> > > Matt.
> > > PS: Mailing list topics go unanswered and github issues get lost in the 
> > > noise,
> > so I'm hoping there's a more efficient way to collaborate.
> > 
> > In what specific ways is networkd not compliant?
> > --
> > Sincerely,
> > Demi Marie Obenour (she/her/hers)
> > Invisible Things Lab
> 
> Hi Demi,
> 
> > In what specific ways is networkd not compliant?
> 
> Refer to previous mailing list topics [1] and github issues, especially any 
> issues opened by LiveFreeAndRoam [2].
> 
> Are you a networkd developer?  Are you willing to collaborate on this?
> 
> [1] 
> https://www.mail-archive.com/search?a=1&l=systemd-devel%40lists.freedesktop.org&haswords=ipv6+compliance&x=0&y=0&from=&subject=&datewithin=1d&date=¬words=&o=relevance
> [2] 
> https://github.com/systemd/systemd/issues?q=is%3Aissue+author%3Alivefreeandroam

If you need these problems fixed so that you can use systemd-networkd in
your commercial products, I recommend getting your company to pay
developers to fix systemd-networkd.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Demi Marie Obenour
On Mon, Dec 11, 2023 at 08:58:58PM +, Luca Boccassi wrote:
> On Mon, 11 Dec 2023 at 20:43, Demi Marie Obenour
>  wrote:
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > On Mon, Dec 11, 2023 at 08:15:27PM +, Luca Boccassi wrote:
> > > On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
> > >  wrote:
> > > >
> > > > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > > > >
> > > > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > > > mini-initramfs contains just enough to get storage drivers loaded 
> > > > > > and
> > > > > > storage devices initialized. storage-init is a process that is not
> > > > > > designed to replace init, it does just enough to initialize storage
> > > > > > (performs a targeted udev trigger on storage), switches to
> > > > > > initoverlayfs as root and then executes init.
> > > > > >
> > > > > > ```
> > > > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> 
> > > > > > rootfs
> > > > > >
> > > > > > fw -> bootloader -> kernel -> storage-init   -> init 
> > > > > > ->
> > > > > > ```
> > > > >
> > > > > I am not sure I follow what these chains are supposed to mean? Why are
> > > > > there two lines?
> > > > >
> > > > > So, I generally would agree that the current initrd scheme is not
> > > > > ideal, and we have been discussing better approaches. But I am not
> > > > > sure your approach really is useful on generic systems for two
> > > > > reasons:
> > > > >
> > > > > 1. no security model? you need to authenticate your initrd in
> > > > >2023. There's no execuse to not doing that anymore these days. Not
> > > > >in automotive, and not anywhere else really.
> > > > >
> > > > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > > > >unlock their root disks with TPM2 and similar things. People use
> > > > >RAID, LVM, and all that mess.
> > > > >
> > > > > Actually the above are kinda the same problem in a way: you need
> > > > > complex storage, but if you need that you kinda need udev, and
> > > > > services, and then also systemd and all that other stuff, and that's
> > > > > why the system works like the system works right now.
> > > > >
> > > > > Whenever you devise a system like yours by cutting corners, and
> > > > > declaring that you don't want TPM, you don't want signed initrds, you
> > > > > don't want to support weird storage, you just solve your problem in a
> > > > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > > > actually really work without all that and are willing to maintain the
> > > > > solution for your specific problem only.
> > > > >
> > > > > As I understand you are trying to solve multiple problems at once
> > > > > here, and I think one should start with figuring out clearly what
> > > > > those are before trying to address them, maybe without compromising on
> > > > > security. So my guess is you want to address the following:
> > > > >
> > > > > 1. You don't want the whole big initrd to be read off disk on every
> > > > >boot, but only the parts of it that are actually needed.
> > > > >
> > > > > 2. You don't want the whole big initrd to be fully decompressed on 
> > > > > every
> > > > >boot, but only the parts of it that are actually needed.
> > > > >
> > > > > 3. You want to share data between root fs and initrd
> > > > >
> > > > > 4. You want to save some boot time by not bringing up an init system
> > > > >in the initrd once, then tearing it down again, and starting it
> > > > >again from the root fs.
> > > > >
> > > > > For the items listed above I think you can find different solutions
> > > > > which do not necessarily compro

Re: IPv6 Compliance for networkd

2023-12-11 Thread Demi Marie Obenour
On Mon, Dec 11, 2023 at 07:14:27PM +, Muggeridge, Matt wrote:
> Hello, networkd developer community,
> 
> I am hoping to rally support for making networkd IPv6 compliant and I'm will 
> to help, but cannot do it alone. Is there any interest in making 
> systemd-networkd IPv6 compliant?
> 
> There are many organizations (especially US Government) that mandate IPv6 
> compliance (USGv6).  Products that are dependent on networkd cannot be bid to 
> these customers.
> 
> How do I engage with the right people in the developer community?
> 
> Thanks,
> Matt.
> PS: Mailing list topics go unanswered and github issues get lost in the 
> noise, so I'm hoping there's a more efficient way to collaborate.

In what specific ways is networkd not compliant?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Demi Marie Obenour
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On Mon, Dec 11, 2023 at 08:15:27PM +, Luca Boccassi wrote:
> On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
>  wrote:
> >
> > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > >
> > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > mini-initramfs contains just enough to get storage drivers loaded and
> > > > storage devices initialized. storage-init is a process that is not
> > > > designed to replace init, it does just enough to initialize storage
> > > > (performs a targeted udev trigger on storage), switches to
> > > > initoverlayfs as root and then executes init.
> > > >
> > > > ```
> > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > > >
> > > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > > ```
> > >
> > > I am not sure I follow what these chains are supposed to mean? Why are
> > > there two lines?
> > >
> > > So, I generally would agree that the current initrd scheme is not
> > > ideal, and we have been discussing better approaches. But I am not
> > > sure your approach really is useful on generic systems for two
> > > reasons:
> > >
> > > 1. no security model? you need to authenticate your initrd in
> > >2023. There's no execuse to not doing that anymore these days. Not
> > >in automotive, and not anywhere else really.
> > >
> > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > >unlock their root disks with TPM2 and similar things. People use
> > >RAID, LVM, and all that mess.
> > >
> > > Actually the above are kinda the same problem in a way: you need
> > > complex storage, but if you need that you kinda need udev, and
> > > services, and then also systemd and all that other stuff, and that's
> > > why the system works like the system works right now.
> > >
> > > Whenever you devise a system like yours by cutting corners, and
> > > declaring that you don't want TPM, you don't want signed initrds, you
> > > don't want to support weird storage, you just solve your problem in a
> > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > actually really work without all that and are willing to maintain the
> > > solution for your specific problem only.
> > >
> > > As I understand you are trying to solve multiple problems at once
> > > here, and I think one should start with figuring out clearly what
> > > those are before trying to address them, maybe without compromising on
> > > security. So my guess is you want to address the following:
> > >
> > > 1. You don't want the whole big initrd to be read off disk on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 2. You don't want the whole big initrd to be fully decompressed on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 3. You want to share data between root fs and initrd
> > >
> > > 4. You want to save some boot time by not bringing up an init system
> > >in the initrd once, then tearing it down again, and starting it
> > >again from the root fs.
> > >
> > > For the items listed above I think you can find different solutions
> > > which do not necessarily compromise security as much.
> > >
> > > So, in the list above you could address the latter three like this:
> > >
> > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > >the kernel cmdline to synthesize a block device from that, which
> > >you then mount directly (without any initrd) via
> > >root=/dev/pmem0. This means yout boot loader will still load the
> > >whole image into memory, but only decompress the bits actually
> > >neeed. (It also has some other nice benefits I like, such as an
> > >immutable rootfs, which tmpfs-based initrds don't have.)
> > >
> > > 3. Simply never transition to the root fs, don't marke the initrds in
> > >systemd's eyes as an initrd (specifically: don't add an
&

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Demi Marie Obenour
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On Mon, Dec 11, 2023 at 05:03:13PM +, Eric Curtin wrote:
> On Mon, 11 Dec 2023 at 16:36, Demi Marie Obenour
>  wrote:
> >
> > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > On Fr, 08.12.23 17:59, Eric Curtin (ecur...@redhat.com) wrote:
> > >
> > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > mini-initramfs contains just enough to get storage drivers loaded and
> > > > storage devices initialized. storage-init is a process that is not
> > > > designed to replace init, it does just enough to initialize storage
> > > > (performs a targeted udev trigger on storage), switches to
> > > > initoverlayfs as root and then executes init.
> > > >
> > > > ```
> > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > > >
> > > > fw -> bootloader -> kernel -> storage-init   -> init ->
> > > > ```
> > >
> > > I am not sure I follow what these chains are supposed to mean? Why are
> > > there two lines?
> > >
> > > So, I generally would agree that the current initrd scheme is not
> > > ideal, and we have been discussing better approaches. But I am not
> > > sure your approach really is useful on generic systems for two
> > > reasons:
> > >
> > > 1. no security model? you need to authenticate your initrd in
> > >2023. There's no execuse to not doing that anymore these days. Not
> > >in automotive, and not anywhere else really.
> > >
> > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > >unlock their root disks with TPM2 and similar things. People use
> > >RAID, LVM, and all that mess.
> > >
> > > Actually the above are kinda the same problem in a way: you need
> > > complex storage, but if you need that you kinda need udev, and
> > > services, and then also systemd and all that other stuff, and that's
> > > why the system works like the system works right now.
> > >
> > > Whenever you devise a system like yours by cutting corners, and
> > > declaring that you don't want TPM, you don't want signed initrds, you
> > > don't want to support weird storage, you just solve your problem in a
> > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > actually really work without all that and are willing to maintain the
> > > solution for your specific problem only.
> > >
> > > As I understand you are trying to solve multiple problems at once
> > > here, and I think one should start with figuring out clearly what
> > > those are before trying to address them, maybe without compromising on
> > > security. So my guess is you want to address the following:
> > >
> > > 1. You don't want the whole big initrd to be read off disk on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 2. You don't want the whole big initrd to be fully decompressed on every
> > >boot, but only the parts of it that are actually needed.
> > >
> > > 3. You want to share data between root fs and initrd
> > >
> > > 4. You want to save some boot time by not bringing up an init system
> > >in the initrd once, then tearing it down again, and starting it
> > >again from the root fs.
> > >
> > > For the items listed above I think you can find different solutions
> > > which do not necessarily compromise security as much.
> > >
> > > So, in the list above you could address the latter three like this:
> > >
> > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > >loader load the erofs into contigous memory, then use memmap=X!Y on
> > >the kernel cmdline to synthesize a block device from that, which
> > >you then mount directly (without any initrd) via
> > >root=/dev/pmem0. This means yout boot loader will still load the
> > >whole image into memory, but only decompress the bits actually
> > >neeed. (It also has some other nice benefits I like, such as an
> > >immutable rootfs, which tmpfs-based initrds don't have.)
> > >
> > > 3. Simply never transition to the root fs, don't marke the initrds in
> > >systemd's eyes as an initrd (specifically: don't add an
>

Re: [RFC] initoverlayfs - a scalable initial filesystem

2023-12-11 Thread Demi Marie Obenour
 have been
> discussing this off an on internally too. A generic solution to this
> is hard. My current thinking for this could be something like this,
> covering the UEFI world: support sticking a DDI for the main initrd in
> the ESP. The ESP is per definition unencrypted and unauthenticated,
> but otherwise relatively well defined, i.e. known to be vfat and
> discoverable via UUID on a GPT disk. So: build a minimal
> single-process initrd into the kernel (i.e. UKI) that has exactly the
> storage to find a DDI on the ESP, and set it up. i.e. vfat+erofs fs
> drivers, and dm-verity. Then have a PID 1 that does exactly enough to
> jump into the rootfs stored in the ESP. That latter then has proper
> file system drivers, storage drivers, crypto stack, and can unlock the
> real root. This would still be a pretty specific solution to one set
> of devices though, as it could not cover network boots (i.e. where
> there is just no ESP to boot from), but I think this could be kept
> relatively close, as the logic in that case could just fall back into
> loading the DDI that normally would still in the ESP fully into
> memory.

I don't think this is "a pretty specific solution to one set of devices"
_at all_.  To the contrary, it is _exactly_ what I want to see desktop
systems moving to in the future.

It solves the problem of large firmware images.  It solves the problem
of device-specific configuration, because one can use a file on the EFI
system partition that is read by userspace and either treated as
untrusted or TPM-signed.  It means that one have a complete set of
recovery tools in the event of a problem, rather than being limited to
whatever one can squeese into an initramfs.  One can even include a full
GUI stack (with accessibility support!), rather than just Plymouth.  For
Qubes OS, one can include enough of the Xen and Qubes toolstack to even
launch virtual machines, allowing the use of USB devices and networking
for recovery purposes.  It even means that one can use a FIDO2 token to
unlock the hard drive without a USB stack on the host.  And because the
initramfs _only_ needs to load the boot extension volume, it can be
very, _very_ small, which works great with using Linux as a coreboot
payload.

The only problem I can see that this does not solve is network boot, but
that is very much a niche use case when compared to the millions of
Fedora or Debian desktop installs, or even the tens of thousands of
Qubes OS installs.  Furthermore, I would _much_ rather network boot be
handled by userspace and kexec, rather than the closed source UEFI network
stack.

It does require some care when upgrading, as the dm-verity image and the
UKI cannot both be updated atomically, but one can solve that by first
writing the new dm-verity image to a separate location.  The UKI will
try both both the old and new locations for the dm-verity image and
rename the new image over the old one on success.  The wrong image will
simply fail to mount as its root hash will be wrong.

This even allows Apple-esque boot policies to be implemented on
commodity hardware, provided that the system firmware is sufficiently
hardened.  It won't be as good as what Apple does, but it will be a huge
win from what is possible today.

> (If you are focussing on systems lacking UEFI, then replace the word
> "ESP" in the above with a similar concept, i.e. a well discoverable,
> unauthenticated relatively simple file system, such as vfat).
> 
> Anyway, I can't tell you how to solve your specific problems, but if
> there's one thing I'd suggest you to keep in mind then it's the
> security angle, i.e. keep in mind from the beginning how
> authentication of every component of your process shall work, how
> unatteneded disk encryption shall operate and how measurement shall
> work. Security must be built into things from the beginning, not be
> added as an afterthought.

As a Qubes OS developer and a security researcher, thank you.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


[systemd-devel] systemd-pcrlock: what prevents unauthorized changes to the NV index?

2023-12-05 Thread Demi Marie Obenour
What prevents unauthorized changes to the NV index used by
systemd-pcrlock?  Is the secret key itself stored in the NV index, with
the policy deciding who can read the key?  Or does the policy on the NV
index require that the policy established by systemd-pcrlock is itself
satisfied before the NV index can be changed?  In the latter case, does
this mean that the index can be "leaked" in certain error conditions?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] [help] Benchmarking software shows degraded performance

2023-11-30 Thread Demi Marie Obenour
On Wed, Nov 29, 2023 at 09:36:04PM -0800, Christian Hergert wrote:
> On 11/29/23 8:09 PM, nayabbasha.sa...@microchip.com wrote:
> > One of the benchmark case is, where it simply opens graphical window on
> > LCD screen and then simply closes it. For this case, the egt-benchmark
> > shows 9 iterations/sec for busybox init. And it's only 5 iterations/sec
> > for systemd init.
> 
> Have you run perf or some other whole-system profiler on the system to see
> what time is spent in what process and how the systemd case differs from the
> busybox case?

Does perf even support these single core SoCs?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] setting cpulimit/iolimit on mysql thread not entire process

2023-11-27 Thread Demi Marie Obenour
On Tue, Nov 28, 2023 at 08:35:29AM +0200, Mantas Mikulėnas wrote:
> On Tue, Nov 28, 2023 at 8:27 AM jai  wrote:
> 
> > I am able to set cpulimit, iolimit, etc for a process using its pid
> > through cgroups v2. But for some threads of a single mysql process, how can
> > I achieve that?
> >
> 
> You cannot; 1) the limits are per-cgroup and the entire service is a single
> cgroup; 2) the threads are created by mysqld, not by systemd, and systemd
> does not monitor and move service processes across cgroups once the service
> is already running; 3) afaik, in cgroups v2 it isn't even allowed for
> threads of a single process to straddle multiple cgroups anymore.
> 
> I'm not a DBA but I've heard that one common way to handle this would be to
> create a separate MySQL instance (probably on a separate machine, even)
> that would replicate all the data, for the heavy users to query. (Or the
> other way around, main instance for the heavy updates ⇒ replica for regular
> queries.)

Generally heavy analytical queries should be on a replica.  The reason
is that analytical queries are less likely to need the very latest
data, whereas transactions probably do.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Starting a service before any networking

2023-09-26 Thread Demi Marie Obenour
On Tue, Sep 26, 2023 at 11:50:55AM +0100, Mark Rogers wrote:
> I'm sure this is trivial but I've gone round in circles without success.
> 
> I have a script which reads from an SQLite database and generates various
> system configuration files - at the moment these are dhcpcd.conf and
> wpa_supplicant.conf but this might grow in future.
> 
> As such the only dependency the script has is that the filesystem is up and
> running. But the script must complete before anything that the script
> manages the configuration file for.
> 
> My current unit looks like this:
> [Unit]
> Before=networking.service
> After=local-fs.target
> 
> [Service]
> Type=oneshot
> ExectStart=/path/to/script
> 
> [Install]
> RequiredBy=network.target
> 
> Where am I going wrong and what is the right way to do this?
> 
> I've also tried Before=network-pre.target and Wants=network-pre.target
> without success - it was that not working that set me off trying to fix it.

RequiredBy=network-pre.target should be sufficient, but unfortunately
lots of stuff (like systemd-networkd) that should have
Requires=network-pre.target doesn't.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Normal user can ask status of services

2023-08-27 Thread Demi Marie Obenour
On Sun, Aug 27, 2023 at 07:35:53PM +0200, Cecil Westerhof wrote:
> Op zo 27 aug 2023 om 18:30 schreef Leon Fauster  >:
> 
> > Am 26.08.23 um 18:41 schrieb Cecil Westerhof:
> > > Replying on google does not work as I am used to. It sends to the sender
> > > instead of the group. 😱
> > >
> > > Op za 26 aug 2023 om 18:36 schreef Cecil Westerhof
> > > mailto:cldwester...@gmail.com>>:
> > >
> > > Op za 26 aug 2023 om 14:46 schreef Michael Biebl  > > <mailto:mbi...@gmail.com>>:
> > >
> > > Am Sa., 26. Aug. 2023 um 09:44 Uhr schrieb Cecil Westerhof
> > > mailto:cldwester...@gmail.com>>:
> > >  >
> > >  > I am at last implementing systemd timers. The service I
> > > created can have its status queried by a normal user. I thought
> > > I must have made a mistake. But when I do:
> > >  > systemctl status cron
> > >  >
> > >  > I get:
> > >  > ● cron.service - Regular background program processing
> > daemon
> > >  >  Loaded: loaded (/lib/systemd/system/cron.service;
> > > enabled; preset: enabled)
> > >  >  Active: active (running) since Sat 2023-08-19
> > > 18:12:04 CEST; 6 days ago
> > >  >Docs: man:cron(8)
> > >  >Main PID: 790 (cron)
> > >  >   Tasks: 1 (limit: 17837)
> > >  >  Memory: 91.0M
> > >  > CPU: 14min 3.110s
> > >  >  CGroup: /system.slice/cron.service
> > >  >  └─790 /usr/sbin/cron -f
> > >  >
> > >  > Warning: some journal files were not opened due to
> > > insufficient permissions.
> > >  >
> > >  > Is this the expected behaviour?
> > >  > If not: what could be wrong with my system?
> > >  >
> > >  > This is on Debian 11.
> > >
> > > Reading system logs is a privileged operation.
> > >
> > > You can grant this privilege to individual users by adding them
> > > to the
> > > systemd-journal (or adm) group.
> > >
> > > Adding users to the adm will grant them additional privileges,
> > > so be careful.
> > >
> > >
> > > The user is in the lpadmin group, but not in systemd-journal, or adm
> > > and still can ask the status.
> > > Another reply indicates that this is normal.
> > >
> >
> >
> > Well, you can look at the process list anytime as normal user. So, what
> > are you trying to accomplishing. Whats the goal? Hiding the process from
> > the users?
> >
> 
> I was surprised that I could see it. And as I understand it, I am certainly
> not the only one. One reply on my question was even that it is a privileged
> operation and should not be possible without a group added to the user
> which was not added to the user.
> I agree that you can find out everything with ps, but that is a lot more
> work.
> I was just surprised that it was possible —and again I am far from the only
> one—, I just wanted to check it out and now I know it is expected behaviour.
> Better to ask a 'dump' question than staying ignorant I think.

Also access to other users' stuff in /proc can be disabled by a mount
option (hidepid=2).
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] deprecating Forward-Secure Sealing (FSS) in the journal

2023-07-30 Thread Demi Marie Obenour
On Sun, Jul 30, 2023 at 08:35:24PM +0100, Dave Howorth wrote:
> On Sun, 30 Jul 2023 11:52:34 -0400
> Demi Marie Obenour  wrote:
> > On Thu, Jul 27, 2023 at 08:10:41AM +, Zbigniew Jędrzejewski-Szmek
> > wrote:
> > > Hi,
> > > 
> > > I'd like to start $subject. First, we'd just add an entry in NEWS
> > > and make the key generation code print a warning, but then in a
> > > release or few remove the code.
> > > 
> > > See
> > > https://github.com/systemd/systemd/pull/28433/commits/1ecd1a994733d.
> > > 
> > > If you're using FSS, please speak up.
> > > 
> > > Zbyszek  
> > 
> > What is the reason for this change?
> 
> Does the comment in the commit not answer that?

It does, sorry.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] deprecating Forward-Secure Sealing (FSS) in the journal

2023-07-30 Thread Demi Marie Obenour
On Thu, Jul 27, 2023 at 08:10:41AM +, Zbigniew Jędrzejewski-Szmek wrote:
> Hi,
> 
> I'd like to start $subject. First, we'd just add an entry in NEWS
> and make the key generation code print a warning, but then in a release
> or few remove the code.
> 
> See https://github.com/systemd/systemd/pull/28433/commits/1ecd1a994733d.
> 
> If you're using FSS, please speak up.
> 
> Zbyszek

What is the reason for this change?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Running a non-idempotent command from udev

2023-07-15 Thread Demi Marie Obenour
On Sat, Jul 15, 2023 at 09:00:03PM +0300, Mantas Mikulėnas wrote:
> Is that "once per boot", "once per interface appearance", or "once per
> physical NIC lifetime"? Can the command check its effects directly (i.e.
> check whether a setting has been set, or whatever the task is)?

Once per virtual NIC appearance.  The catch is that the NIC can
disappear and reappear very quickly, and the script must be run every
time this happens.  Furthermore, the script must wait for
network-pre.target.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


[systemd-devel] Running a non-idempotent command from udev

2023-07-15 Thread Demi Marie Obenour
What is the appropriate solution for running a non-idempotent command
from udev?  One command needs to be run exactly once when a network
interface appears, and another command needs to be run exactly once when
a network interface disappears.  Both commands need to run after
network-pre.target, but that can be handled in the script themselves.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [systemd-devel] Looking for guidance about starting a systemd service inside the initrd and having it persist after rootfs is mounted

2023-07-13 Thread Demi Marie Obenour
On Thu, Jul 13, 2023 at 12:01:45PM -0400, Brian Masney wrote:
> I am working on a project that has very strict boot time requirements
> in order to have a custom service started within a set time period.
> Waiting for the kernel to initialize the storage controller and mount
> the root filesystem takes too much of the allocated time budget.
> There's various boot speed optimizations that we are working on and
> it's going to take a combination of multiple approaches.

What kind of system is this?  What does the time-critical service depend
on?  If this is a safety requirement (such as the backup camera in a car
turning on fast enough), is Linux the correct choice for this
application, or would a safety-certified RTOS be better option?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature