Re: [systemd-devel] consider dropping defrag of journals on btrfs

2021-02-10 Thread Phillip Susi


Lennart Poettering writes:

> inode, and then donate the old blocks over. This means the inode nr
> changes, which is something I don't like. Semantically it's only
> marginally better than just creating a new file from scratch.

Wait, what do you mean the inode nr changes?  I thought the whole point
of the block donating thing was that you get a contiguous set of blocks
in the new file, then transfer those blocks back to the old inode so
that the inode number and timestamps of the file don't change.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [EXT] Re: consider dropping defrag of journals on btrfs

2021-02-10 Thread Phillip Susi


Chris Murphy writes:

> It's not interleaving. It uses delayed allocation to make random
> writes into sequential writes. It's tries harder to keep file blocks

Yes, and when you do that, you are inverleaving data from multiple files
into a single stream, which you really shouldn't be doing.  IIRC, XFS
has special io streaming modes specifically designed to *prevent* this
from happening and record multiple video streams simultaniously to
different parts of the disk to keep them from being fragmented to hell
like that.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] timesyncd log messages galore

2021-02-10 Thread Ede Wolf

Hi,


My journal get spammed with messages from timesyncd, claiming a changed 
network connection. However, I have not touched the network 
configuration at all and the ntp even happens to be on the same subnet. 
No DHCP either.


Here two examples, 200 messages in 20 minutes uptime, or 5800 of them in 
11 hours:


# journalctl -b0 | grep "Network configuration changed, trying to 
establish connection." | wc -l

199

# uptime
 19:29:34 up 21 min,  3 users,  load average: 1,07, 1,04, 0,87

Another machine:

# journalctl -b0 | grep "Network configuration changed, trying to 
establish connection." | wc -l

5755
# uptime
 19:32:20 up 10:28,  2 users,  load average: 1.21, 1.20, 1.20

Any idea how to stop this? This has been going on for quite a while now, 
but seems to get worse.


systemd 247.3, kernel 5.4.80 and 5.10.14
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] JournalD with TPM

2021-02-10 Thread Andreas Krüger
Hi folks,

 

I'm freelancer and for my client I'm currently working on a board with securtity features running Debian OS and SystemD. Logging is done by JournalD and should run in "sealing" mode. For that, the Logger must be activated by a special journalctl command, which generates a key (FSS) that shall be stored in a "safe" location. Since there is no possiblity (use case) for the key to leave the board, this "safe" location has to be somewhere on the board. For safety reasons, this obviously cannot be a simple cell in RAM.

 

Fortunately, the board includes a TPM (Trusted Platform Module) that could be used for this purpose. However, when verifying the logger's storage the TPM has to be "opened" to get the key. This means that a kind of password is used for this, which is somewhere unprotected in RAM. So with the TPM, I've shifted my problem from key to password.

 

In my opinion, the only way to successfully use a TPM is to let the verification be done by the TPM instead by the logger. In this case, the key will not leave the TPM. 

 

Has there been anything developed in that direction yet?

 

Or maybe there is another approach to solve my problem?

 

Have a nice day,

Andreas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Should services be able to run without /proc?

2021-02-10 Thread Cristian Rodríguez
Glibc needs /proc mounted so the answer is no.

El El mar, 9 de feb. de 2021 a la(s) 12:05, Antonius Frie <
antonius.f...@ruhr-uni-bochum.de> escribió:

> Hi!
>
> So this is kind of a follow-up to the thread in [1], and the
> corresponding PR in [2].
>
> In short, the PR made some changes to allow for cases where /proc was
> not available in the mount namespace of the service, and added a test
> [3] to make sure that this would work. This test was later removed and
> rewritten to block /sys instead [4], because it turned out that having
> /proc unavailable sometimes caused problems with close_all_fds(), which
> is called in exec_child() after namespaces have been set up.
>
> On current master, services that don't have /proc mounted don't work at
> all anymore, since find_executable_full() ends up opening the given path
> and calling access_fd() on the resulting fd, and access_fd uses
> /proc/self/fd/* to turn the fd back into a path it can call access() on.
> As far as I can tell, the reason for not using access on the path
> directly is that access_fd is more elegant since it avoids a potential
> race condition.
>
> In addition to this, setup_private_users() also needs access to
> /proc/$pid/{uid_map, gid_map, setgroups} to do its job.
>
> Given all this, I guess my question is whether it is still desirable to
> allow units to run without /proc, especially given that ProtectProc and
> ProcSubset exist now.* If not, it might be nice to just always mount
> /proc if it wouldn't otherwise be there (i.e. if RootImage/RootDirectory
> is used); currently, MountAPIVFS=yes is basically a required option
> because of this. (I guess you could mount proc manually, but then you
> can't use ProtectProc/ProcSubset.) I'm a bit unhappy about this, because
> MountAPIVFS also mounts /sys and /dev, and then you need separate
> options just to protect those again. Either way, maybe it would be good
> to explicitly state this requirement in the documentation?
>
> Anyway, I hope that this was okay to post here, I don't really know a
> lot about this and maybe there are good reasons for why things are the
> way they are. I'd be happy about feedback though.
>
> Cheers,
> Antonius
>
> * Using both ProtectProc=ptraceable and ProcSubset=pid really doesn't
> let a lot of things through, and I don't think those interfere with any
> of the functions described above. The only thing I'm unsure about is
> setup_private_users(), since that spawns off a child process which then
> goes and writes to /proc/$parent_pid/, but I guess children can ptrace
> their parents? At least it seemed to work when I just tested it.
>
> [1]:
> https://lists.freedesktop.org/archives/systemd-devel/2017-April/038634.html
> [2]: https://github.com/systemd/systemd/pull/5985
> [3]: https://github.com/systemd/systemd/pull/6017
> [4]:
>
> https://github.com/systemd/systemd/commit/054d871d41039fcfc1a4a661c979941b9660c9e6
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Should services be able to run without /proc?

2021-02-10 Thread Lennart Poettering
On Di, 09.02.21 15:57, Antonius Frie (antonius.f...@ruhr-uni-bochum.de) wrote:

> Hi!
>
> So this is kind of a follow-up to the thread in [1], and the corresponding
> PR in [2].
>
> In short, the PR made some changes to allow for cases where /proc was not
> available in the mount namespace of the service, and added a test [3] to
> make sure that this would work. This test was later removed and rewritten to
> block /sys instead [4], because it turned out that having /proc unavailable
> sometimes caused problems with close_all_fds(), which is called in
> exec_child() after namespaces have been set up.
>
> On current master, services that don't have /proc mounted don't work at all
> anymore, since find_executable_full() ends up opening the given path and
> calling access_fd() on the resulting fd, and access_fd uses /proc/self/fd/*
> to turn the fd back into a path it can call access() on. As far as I can
> tell, the reason for not using access on the path directly is that access_fd
> is more elegant since it avoids a potential race condition.

Yes, we try to move to a mode where for most such things that involve
context switches/credential switches/domain transitions we operate via
O_PATH file handles: i.e. resolve in our original context, until we
only have fds pointing to the final thing, and then do the final
operation only on those fds. This should fix a bunch of races and
potential races for us.

> In addition to this, setup_private_users() also needs access to
> /proc/$pid/{uid_map, gid_map, setgroups} to do its job.

Yes, a multitude of Linux APIs are exposed via /proc/. I think outside
of trivial programs it's very hard to avoid having /proc/. glibc
internally encodes access to it all over the place
too.

> Given all this, I guess my question is whether it is still desirable to
> allow units to run without /proc, especially given that ProtectProc and
> ProcSubset exist now.* If not, it might be nice to just always mount /proc
> if it wouldn't otherwise be there (i.e. if RootImage/RootDirectory is used);
> currently, MountAPIVFS=yes is basically a required option because of this.
> (I guess you could mount proc manually, but then you can't use
> ProtectProc/ProcSubset.) I'm a bit unhappy about this, because MountAPIVFS
> also mounts /sys and /dev, and then you need separate options just to
> protect those again. Either way, maybe it would be good to explicitly state
> this requirement in the documentation?

We could add MountAPIVFS=proc or so as alternative to yes/no, which
would only mount /proc.

Note that on current git it actually also mounts /run/ and that on
current git it also defaults to true if RootImage=/RootDirectory= are
used, see 6119878480aab4c10ad6af33deab221778683807.

You can get force MountAPIVFS=no still btw, to get back the status quo
ante: i.e. a RootImage=/RootDirectory= env without /proc.

> Anyway, I hope that this was okay to post here, I don't really know a lot
> about this and maybe there are good reasons for why things are the way they
> are. I'd be happy about feedback though.

Yes, this is the right place.

If you think the MountAPIVFS=proc thing would be desirable to you,
consider posting an RFE issue asking for it on github. Or even better,
submit a PR.

> * Using both ProtectProc=ptraceable and ProcSubset=pid really doesn't
> let a lot of things through, and I don't think those interfere with any of
> the functions described above. The only thing I'm unsure about is
> setup_private_users(), since that spawns off a child process which then
> goes and writes to /proc/$parent_pid/, but I guess children can ptrace
> their parents? At least it seemed to work when I just tested it.

On traditional Linux any ptracable means "uid matches". With yama lsm
parents can ptrce the children but not vice versa.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] consider dropping defrag of journals on btrfs

2021-02-10 Thread Lennart Poettering
On Di, 09.02.21 10:17, Phillip Susi (ph...@thesusis.net) wrote:

>
> Chris Murphy writes:
>
> > And I agree 8MB isn't a big deal. Does anyone complain about journal
> > fragmentation on ext4 or xfs? If not, then we come full circle to my
> > second email in the thread which is don't defragment when nodatacow,
> > only defragment when datacow. Or use BTRFS_IOC_DEFRAG_RANGE and
> > specify 8MB length. That does seem to consistently no op on nodatacow
> > journals which have 8MB extents.
>
> Ok, I agree there.
>
> > The reason I'm dismissive is because the nodatacow fragment case is
> > the same as ext4 and XFS; the datacow fragment case is both
> > spectacular and non-deterministic. The workload will matter where
>
> Your argument seems to be that it's no worse than ext4 and so if we
> don't defrag there, why on btrfs?  Lennart seems to be arguing that the
> only reason systemd doesn't defrag on ext4 is because the ioctl is
> harder to use.

It's not just harder to use, it's uglier: you have to create a new
inode, and then donate the old blocks over. This means the inode nr
changes, which is something I don't like. Semantically it's only
marginally better than just creating a new file from scratch.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] consider dropping defrag of journals on btrfs

2021-02-10 Thread Lennart Poettering
On Mo, 08.02.21 22:13, Chris Murphy (li...@colorremedies.com) wrote:

> On Mon, Feb 8, 2021 at 7:56 AM Phillip Susi  wrote:
> >
> >
> > Chris Murphy writes:
> >
> > >> It sounds like you are arguing that it is better to do the wrong thing
> > >> on all SSDs rather than do the right thing on ones that aren't broken.
> > >
> > > No I'm suggesting there isn't currently a way to isolate
> > > defragmentation to just HDDs.
> >
> > Yes, but it sounded like you were suggesting that we shouldn't even try,
> > not just that it isn't 100% accurate.  Sure, some SSDs will be stupid
> > and report that they are rotational, but most aren't stupid, so it's a
> > good idea to disable the defragmentation on drives that report that they
> > are non rotational.
>
> So far I've seen, all USB devices report rotational. All USB flash
> drives, and any SSD in an enclosure.
>
> Maybe some way of estimating rotational based on latency standard
> deviation, and stick that in sysfs, instead of trusting device
> reporting. But in the meantime, the imperfect rule could be do not
> defragment unless it's SCSI/SATA/SAS and it reports it's rotational.

btrfs itelf has a knob declaring whether something is ssd or not ssd,
configurable via the mount option. Of course, one would bind any
higher level logic to that same thing, and thus make it btrfs' own
problem, or the admin's.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel