Re: [systemd-devel] consider dropping defrag of journals on btrfs
Lennart Poettering writes: > inode, and then donate the old blocks over. This means the inode nr > changes, which is something I don't like. Semantically it's only > marginally better than just creating a new file from scratch. Wait, what do you mean the inode nr changes? I thought the whole point of the block donating thing was that you get a contiguous set of blocks in the new file, then transfer those blocks back to the old inode so that the inode number and timestamps of the file don't change. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [EXT] Re: consider dropping defrag of journals on btrfs
Chris Murphy writes: > It's not interleaving. It uses delayed allocation to make random > writes into sequential writes. It's tries harder to keep file blocks Yes, and when you do that, you are inverleaving data from multiple files into a single stream, which you really shouldn't be doing. IIRC, XFS has special io streaming modes specifically designed to *prevent* this from happening and record multiple video streams simultaniously to different parts of the disk to keep them from being fragmented to hell like that. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] timesyncd log messages galore
Hi, My journal get spammed with messages from timesyncd, claiming a changed network connection. However, I have not touched the network configuration at all and the ntp even happens to be on the same subnet. No DHCP either. Here two examples, 200 messages in 20 minutes uptime, or 5800 of them in 11 hours: # journalctl -b0 | grep "Network configuration changed, trying to establish connection." | wc -l 199 # uptime 19:29:34 up 21 min, 3 users, load average: 1,07, 1,04, 0,87 Another machine: # journalctl -b0 | grep "Network configuration changed, trying to establish connection." | wc -l 5755 # uptime 19:32:20 up 10:28, 2 users, load average: 1.21, 1.20, 1.20 Any idea how to stop this? This has been going on for quite a while now, but seems to get worse. systemd 247.3, kernel 5.4.80 and 5.10.14 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] JournalD with TPM
Hi folks, I'm freelancer and for my client I'm currently working on a board with securtity features running Debian OS and SystemD. Logging is done by JournalD and should run in "sealing" mode. For that, the Logger must be activated by a special journalctl command, which generates a key (FSS) that shall be stored in a "safe" location. Since there is no possiblity (use case) for the key to leave the board, this "safe" location has to be somewhere on the board. For safety reasons, this obviously cannot be a simple cell in RAM. Fortunately, the board includes a TPM (Trusted Platform Module) that could be used for this purpose. However, when verifying the logger's storage the TPM has to be "opened" to get the key. This means that a kind of password is used for this, which is somewhere unprotected in RAM. So with the TPM, I've shifted my problem from key to password. In my opinion, the only way to successfully use a TPM is to let the verification be done by the TPM instead by the logger. In this case, the key will not leave the TPM. Has there been anything developed in that direction yet? Or maybe there is another approach to solve my problem? Have a nice day, Andreas ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Should services be able to run without /proc?
Glibc needs /proc mounted so the answer is no. El El mar, 9 de feb. de 2021 a la(s) 12:05, Antonius Frie < antonius.f...@ruhr-uni-bochum.de> escribió: > Hi! > > So this is kind of a follow-up to the thread in [1], and the > corresponding PR in [2]. > > In short, the PR made some changes to allow for cases where /proc was > not available in the mount namespace of the service, and added a test > [3] to make sure that this would work. This test was later removed and > rewritten to block /sys instead [4], because it turned out that having > /proc unavailable sometimes caused problems with close_all_fds(), which > is called in exec_child() after namespaces have been set up. > > On current master, services that don't have /proc mounted don't work at > all anymore, since find_executable_full() ends up opening the given path > and calling access_fd() on the resulting fd, and access_fd uses > /proc/self/fd/* to turn the fd back into a path it can call access() on. > As far as I can tell, the reason for not using access on the path > directly is that access_fd is more elegant since it avoids a potential > race condition. > > In addition to this, setup_private_users() also needs access to > /proc/$pid/{uid_map, gid_map, setgroups} to do its job. > > Given all this, I guess my question is whether it is still desirable to > allow units to run without /proc, especially given that ProtectProc and > ProcSubset exist now.* If not, it might be nice to just always mount > /proc if it wouldn't otherwise be there (i.e. if RootImage/RootDirectory > is used); currently, MountAPIVFS=yes is basically a required option > because of this. (I guess you could mount proc manually, but then you > can't use ProtectProc/ProcSubset.) I'm a bit unhappy about this, because > MountAPIVFS also mounts /sys and /dev, and then you need separate > options just to protect those again. Either way, maybe it would be good > to explicitly state this requirement in the documentation? > > Anyway, I hope that this was okay to post here, I don't really know a > lot about this and maybe there are good reasons for why things are the > way they are. I'd be happy about feedback though. > > Cheers, > Antonius > > * Using both ProtectProc=ptraceable and ProcSubset=pid really doesn't > let a lot of things through, and I don't think those interfere with any > of the functions described above. The only thing I'm unsure about is > setup_private_users(), since that spawns off a child process which then > goes and writes to /proc/$parent_pid/, but I guess children can ptrace > their parents? At least it seemed to work when I just tested it. > > [1]: > https://lists.freedesktop.org/archives/systemd-devel/2017-April/038634.html > [2]: https://github.com/systemd/systemd/pull/5985 > [3]: https://github.com/systemd/systemd/pull/6017 > [4]: > > https://github.com/systemd/systemd/commit/054d871d41039fcfc1a4a661c979941b9660c9e6 > ___ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/systemd-devel > ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Should services be able to run without /proc?
On Di, 09.02.21 15:57, Antonius Frie (antonius.f...@ruhr-uni-bochum.de) wrote: > Hi! > > So this is kind of a follow-up to the thread in [1], and the corresponding > PR in [2]. > > In short, the PR made some changes to allow for cases where /proc was not > available in the mount namespace of the service, and added a test [3] to > make sure that this would work. This test was later removed and rewritten to > block /sys instead [4], because it turned out that having /proc unavailable > sometimes caused problems with close_all_fds(), which is called in > exec_child() after namespaces have been set up. > > On current master, services that don't have /proc mounted don't work at all > anymore, since find_executable_full() ends up opening the given path and > calling access_fd() on the resulting fd, and access_fd uses /proc/self/fd/* > to turn the fd back into a path it can call access() on. As far as I can > tell, the reason for not using access on the path directly is that access_fd > is more elegant since it avoids a potential race condition. Yes, we try to move to a mode where for most such things that involve context switches/credential switches/domain transitions we operate via O_PATH file handles: i.e. resolve in our original context, until we only have fds pointing to the final thing, and then do the final operation only on those fds. This should fix a bunch of races and potential races for us. > In addition to this, setup_private_users() also needs access to > /proc/$pid/{uid_map, gid_map, setgroups} to do its job. Yes, a multitude of Linux APIs are exposed via /proc/. I think outside of trivial programs it's very hard to avoid having /proc/. glibc internally encodes access to it all over the place too. > Given all this, I guess my question is whether it is still desirable to > allow units to run without /proc, especially given that ProtectProc and > ProcSubset exist now.* If not, it might be nice to just always mount /proc > if it wouldn't otherwise be there (i.e. if RootImage/RootDirectory is used); > currently, MountAPIVFS=yes is basically a required option because of this. > (I guess you could mount proc manually, but then you can't use > ProtectProc/ProcSubset.) I'm a bit unhappy about this, because MountAPIVFS > also mounts /sys and /dev, and then you need separate options just to > protect those again. Either way, maybe it would be good to explicitly state > this requirement in the documentation? We could add MountAPIVFS=proc or so as alternative to yes/no, which would only mount /proc. Note that on current git it actually also mounts /run/ and that on current git it also defaults to true if RootImage=/RootDirectory= are used, see 6119878480aab4c10ad6af33deab221778683807. You can get force MountAPIVFS=no still btw, to get back the status quo ante: i.e. a RootImage=/RootDirectory= env without /proc. > Anyway, I hope that this was okay to post here, I don't really know a lot > about this and maybe there are good reasons for why things are the way they > are. I'd be happy about feedback though. Yes, this is the right place. If you think the MountAPIVFS=proc thing would be desirable to you, consider posting an RFE issue asking for it on github. Or even better, submit a PR. > * Using both ProtectProc=ptraceable and ProcSubset=pid really doesn't > let a lot of things through, and I don't think those interfere with any of > the functions described above. The only thing I'm unsure about is > setup_private_users(), since that spawns off a child process which then > goes and writes to /proc/$parent_pid/, but I guess children can ptrace > their parents? At least it seemed to work when I just tested it. On traditional Linux any ptracable means "uid matches". With yama lsm parents can ptrce the children but not vice versa. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
On Di, 09.02.21 10:17, Phillip Susi (ph...@thesusis.net) wrote: > > Chris Murphy writes: > > > And I agree 8MB isn't a big deal. Does anyone complain about journal > > fragmentation on ext4 or xfs? If not, then we come full circle to my > > second email in the thread which is don't defragment when nodatacow, > > only defragment when datacow. Or use BTRFS_IOC_DEFRAG_RANGE and > > specify 8MB length. That does seem to consistently no op on nodatacow > > journals which have 8MB extents. > > Ok, I agree there. > > > The reason I'm dismissive is because the nodatacow fragment case is > > the same as ext4 and XFS; the datacow fragment case is both > > spectacular and non-deterministic. The workload will matter where > > Your argument seems to be that it's no worse than ext4 and so if we > don't defrag there, why on btrfs? Lennart seems to be arguing that the > only reason systemd doesn't defrag on ext4 is because the ioctl is > harder to use. It's not just harder to use, it's uglier: you have to create a new inode, and then donate the old blocks over. This means the inode nr changes, which is something I don't like. Semantically it's only marginally better than just creating a new file from scratch. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
On Mo, 08.02.21 22:13, Chris Murphy (li...@colorremedies.com) wrote: > On Mon, Feb 8, 2021 at 7:56 AM Phillip Susi wrote: > > > > > > Chris Murphy writes: > > > > >> It sounds like you are arguing that it is better to do the wrong thing > > >> on all SSDs rather than do the right thing on ones that aren't broken. > > > > > > No I'm suggesting there isn't currently a way to isolate > > > defragmentation to just HDDs. > > > > Yes, but it sounded like you were suggesting that we shouldn't even try, > > not just that it isn't 100% accurate. Sure, some SSDs will be stupid > > and report that they are rotational, but most aren't stupid, so it's a > > good idea to disable the defragmentation on drives that report that they > > are non rotational. > > So far I've seen, all USB devices report rotational. All USB flash > drives, and any SSD in an enclosure. > > Maybe some way of estimating rotational based on latency standard > deviation, and stick that in sysfs, instead of trusting device > reporting. But in the meantime, the imperfect rule could be do not > defragment unless it's SCSI/SATA/SAS and it reports it's rotational. btrfs itelf has a knob declaring whether something is ssd or not ssd, configurable via the mount option. Of course, one would bind any higher level logic to that same thing, and thus make it btrfs' own problem, or the admin's. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel