Re: [systemd-devel] jio test results
Dear Vito, Am 26.11.20 um 19:54 schrieb Vito Caputo: On Thu, Nov 26, 2020 at 12:28:16PM +0100, Paul Menzel wrote: […] Thank you. It builds fine on Debian Sid/unstable with GCC 10.2.0-19 (some warnings), and the results are attached. Any chance you could mail me the warnings? I see none here, would prefer a clean build for others. Hopefully there won’t be line wrapping when viewed with `format=flowed`. ``` $ gcc --version gcc (Debian 10.2.0-19) 10.2.0 […] $ make […] CC jio-journals.o journals.c: In function ‘got_hash_table_iter_object_header’: journals.c:361:4: warning: converting a packed ‘HashedObjectHeader’ pointer (alignment 1) to a ‘DataObject’ pointer (alignment 8) may result in an unaligned pointer value [-Waddress-of-packed-member] 361 |DataObject *data_object = (DataObject *)iter_object_header; |^~ In file included from journals.h:14, from journals.c:30: upstream/journal-def.h:80:8: note: defined here 80 | struct HashedObjectHeader { |^~ upstream/journal-def.h:95:8: note: defined here 95 | struct DataObject DataObject__contents; |^~ journals.c:372:4: warning: converting a packed ‘HashedObjectHeader’ pointer (alignment 1) to a ‘FieldObject’ pointer (alignment 8) may result in an unaligned pointer value [-Waddress-of-packed-member] 372 |FieldObject *field_object = (FieldObject *)iter_object_header; |^~~ In file included from journals.h:14, from journals.c:30: upstream/journal-def.h:80:8: note: defined here 80 | struct HashedObjectHeader { |^~ upstream/journal-def.h:105:8: note: defined here 105 | struct FieldObject FieldObject__contents; |^~~ […] ``` Kind regards, Paul ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring
On Thu, Nov 26, 2020 at 07:58:38PM +0100, Lennart Poettering wrote: > On Mi, 25.11.20 19:02, Vito Caputo (vcap...@pengaru.com) wrote: > > > Hello systemd-devel, > > > > Recent discussion here about journal space consumption happened to > > occur while I was exploring use of the new io_uring linux kernel > > interface in combination with journal files. > > I'd be really curious if a iouring based reader could outperform the > mmap based ones. I have the suspicion that yes, since paging in stuff > is slow, and doesn't really allow for reordering. > Same here, I expect superior uncached performance, and either equal or slightly worse cached performance. So the worst case should be better, and it might be a worthwhile trade off for that alone. Also the programming style is inherently quite different, not sure if it's a net win or loss there, probably depends on the preference of who's stuck maintaining it. I'm also curious to see if io_uring is efficient enough that one can treat the page cache as the application cache, and completely skip implementing (and allocating space for) any caching layer in the userspace process. In jio currently, all the tiny object header reads of `jio report usage` are being done with discrete io_uring requests, there's no equivalent to the mmap-cache found in systemd-journald. This results in frequently crossing the application:kernel boundary, granularly fetching journal data from the page cache. It makes for a *very* simple implementation, and the kind of thing you'd strongly try to avoid when using something like plain read() syscalls if concerned with performance. My knee-jerk reaction to such an approach is it'd be crazy to not even cache the entry arrays and some hot set of entries and data objects local to the process during searches at least. But starting with a dumb implementation leaning hard on io_uring seems worth at least measuring, and maybe it'd prove Good Enough (tm), while staying within a much smaller address space. And it's not like such an implementation would be difficult to extend with some targeted caching of objects where needed. BTW, do you have any objections to jio discussions occuring on systemd-devel? Regards, Vito Caputo ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring
On Mi, 25.11.20 19:02, Vito Caputo (vcap...@pengaru.com) wrote: > Hello systemd-devel, > > Recent discussion here about journal space consumption happened to > occur while I was exploring use of the new io_uring linux kernel > interface in combination with journal files. I'd be really curious if a iouring based reader could outperform the mmap based ones. I have the suspicion that yes, since paging in stuff is slow, and doesn't really allow for reordering. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] jio test results (was: [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring)
On Thu, Nov 26, 2020 at 12:28:16PM +0100, Paul Menzel wrote: > [No idea, if I should carbon copy the list or not.] I hope you don't mind, but I've CC'd the list in replying. At this point it's helpful for others to see: 1. jio compiled and worked 2. jio didn't crash or rm -Rf / your system 3. what kind of information it produced > > Am 26.11.20 um 04:02 schrieb Vito Caputo: > > Hello systemd-devel, > > > > Recent discussion here about journal space consumption happened to > > occur while I was exploring use of the new io_uring linux kernel > > interface in combination with journal files. > > > > What began as a curiosity about this new kernel interface, and what it > > would be like to program journal file processing with some kind of > > continuation-passing style C code bolted onto it, evolved into > > something already providing new visibility into journal-file space > > utilization not AFAIK currently offered by systemd's journalctl (not > > that it couldn't be added there too). > > > > I've called this program jio, pronounced "jai-oh". > > Thank you for writing and sharing this. > You're welcome, it was fun to write. Thanks for being a guinea pig! > > At this time jio implements three basic functions: > > > > 1. `jio report usage` > > > > READ-ONLY > > > > Measures and reports space actually used by objects in all > > accessible journal files, classified by object type. > > > > > > 2. `jio report tail-waste` > > > > READ-ONLY > > > > Measures and reports unused space allocated to the tail ends of > > all journal files, classified by journal state. Journal state > > being Online, Offline, or Archived. > > Thank you. It builds fine on Debian Sid/unstable with GCC 10.2.0-19 (some > warnings), and the results are attached. > Any chance you could mail me the warnings? I see none here, would prefer a clean build for others. > Per-object-type usage: > UNUSED: [0] 0.00 B > Data: [2404148] 261.85 MiB >Field: [6491] 331.51 KiB >Entry: [1521426] 712.38 MiB >DataHashTable: [100] 355.56 MiB > FieldHashTable: [100] 521.88 KiB > EntryArray: [554339] 577.14 MiB > Tag: [0] 0.00 B > Aggregate object usage: 1.86 GiB of 2.22 GiB spanning 100 journal files > > > Totals: > Tail-waste by state: > Offline [3]: 7.23 MiB, 2% of all tail-waste > Online [1]: 7.37 MiB, 2% of all tail-waste > Archived [96]: 339.82 MiB, 95% of all tail-waste > > Aggregate tail-waste: 354.42 MiB, 15% of 2.22 GiB spanning 100 journal > files This is useful information. On my system there's an accumulation of Offline user journals accounting for a majority of the tail-waste, which being Offline the `jio reclaim tail-waste` won't touch. We need more samples from the field, but this suggests it might be something unique to my system. Thanks again! ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd 247 released
A new, official systemd release has just been tagged . Please download the tarball here: https://github.com/systemd/systemd/archive/v247.tar.gz Changes since the previous release: * KERNEL API INCOMPATIBILITY: Linux 4.14 introduced two new uevents "bind" and "unbind" to the Linux device model. When this kernel change was made, systemd-udevd was only minimally updated to handle and propagate these new event types. The introduction of these new uevents (which are typically generated for USB devices and devices needing a firmware upload before being functional) resulted in a number of issues which we so far didn't address. We hoped the kernel maintainers would themselves address these issues in some form, but that did not happen. To handle them properly, many (if not most) udev rules files shipped in various packages need updating, and so do many programs that monitor or enumerate devices with libudev or sd-device, or otherwise process uevents. Please note that this incompatibility is not fault of systemd or udev, but caused by an incompatible kernel change that happened back in Linux 4.12, but is becoming more and more visible as the new uevents are generated by more kernel drivers. To minimize issues resulting from this kernel change (but not avoid them entirely) starting with systemd-udevd 247 the udev "tags" concept (which is a concept for marking and filtering devices during enumeration and monitoring) has been reworked: udev tags are now "sticky", meaning that once a tag is assigned to a device it will not be removed from the device again until the device itself is removed (i.e. unplugged). This makes sure that any application monitoring devices that match a specific tag is guaranteed to both see uevents where the device starts being relevant, and those where it stops being relevant (the latter now regularly happening due to the new "unbind" uevent type). The udev tags concept is hence now a concept tied to a *device* instead of a device *event* — unlike for example udev properties whose lifecycle (as before) is generally tied to a device event, meaning that the previously determined properties are forgotten whenever a new uevent is processed. With the newly redefined udev tags concept, sometimes it's necessary to determine which tags are the ones applied by the most recent uevent/database update, in order to discern them from those originating from earlier uevents/database updates of the same device. To accommodate for this a new automatic property CURRENT_TAGS has been added that works similar to the existing TAGS property but only lists tags set by the most recent uevent/database update. Similarly, the libudev/sd-device API has been updated with new functions to enumerate these 'current' tags, in addition to the existing APIs that now enumerate the 'sticky' ones. To properly handle "bind"/"unbind" on Linux 4.12 and newer it is essential that all udev rules files and applications are updated to handle the new events. Specifically: • All rule files that currently use a header guard similar to ACTION!="add|change",GOTO="xyz_end" should be updated to use ACTION=="remove",GOTO="xyz_end" instead, so that the properties/tags they add are also applied whenever "bind" (or "unbind") is seen. (This is most important for all physical device types — those for which "bind" and "unbind" are currently generated, for all other device types this change is still recommended but not as important — but certainly prepares for future kernel uevent type additions). • Similarly, all code monitoring devices that contains an 'if' branch discerning the "add" + "change" uevent actions from all other uevents actions (i.e. considering devices only relevant after "add" or "change", and irrelevant on all other events) should be reworked to instead negatively check for "remove" only (i.e. considering devices relevant after all event types, except for "remove", which invalidates the device). Note that this also means that devices should be considered relevant on "unbind", even though conceptually this — in some form — invalidates the device. Since the precise effect of "unbind" is not generically defined, devices should be considered relevant even after "unbind", however I/O errors accessing the device should then be handled gracefully.
Re: [systemd-devel] [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring
On Thu, Nov 26, 2020 at 11:25:22AM +, Dave Howorth wrote: > On Wed, 25 Nov 2020 19:02:21 -0800 > Vito Caputo wrote: > > > I've called this program jio, pronounced "jai-oh". > > Silly question, but how is 'jai' pronounced? The name is like saying the letters I O, with the J sound prefixed, but not pronouncing the letter J distinctly like the I and O, let the J blend into the I. Then get a bit lazy with the separation of I and O too so it all flows with no breaks. A friend pointed out the similarity to the Chinese word Jiayou: https://en.wikipedia.org/wiki/Jiayou Which might actually be appropriate, considering the exploration of io_uring performance aspect. "Give it more gas!" Cheers, Vito Caputo ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring
On Wed, 25 Nov 2020 19:02:21 -0800 Vito Caputo wrote: > I've called this program jio, pronounced "jai-oh". Silly question, but how is 'jai' pronounced? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Mounting / as writable without in `/etc/fstab`
Dear Mantas, Thank you for your detailed reply. Am 26.11.20 um 09:12 schrieb Mantas Mikulėnas: On Mon, Nov 23, 2020 at 5:23 PM Paul Menzel wrote: Is an entry for / in `/etc/fstab` still needed, or is there a systemd way of doing it? That *is* the systemd way -- the fstab entry will be read by systemd-remount-fs(8) and the new mount options applied. Thank you. That wasn’t clear to me. Installing Debian bullseye/testing with the Debian Installer, it creates a GPT and `/etc/fstab`. [...] Commenting out the entries for `/`, the root partition is mounted as read-only. $ findmnt / TARGET SOURCE FSTYPE OPTIONS / /dev/nvme0n1p2 ext4 ro,relatime Shouldn’t it be mounted as writable? No, if you had it initially mounted with 'ro' and did not leave any instructions for remounting, then it won't be remounted... Understood. $ sudo /lib/systemd/systemd-remount-fs $ findmnt / TARGET SOURCE FSTYPE OPTIONS / /dev/nvme0n1p2 ext4 rw,relatime,errors=remount-ro Sorry, that was my confusion. I checked it again, and I messed up during testing. This result was with the entry for / present in `/etc/fstab`, that means not commented out. The log says: [2.320133] systemd[179]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator succeeded. I can work around it changing `ro` to `rw` on the Linux command line, but I thought, it is possible without that. I would say that having the initramfs directly mount the filesystem as rw is the *preferred method*, not a workaround... Of course it depends on how your distro's initramfs wants to work, but at least that's what Arch does -- since fsck is run from the initramfs, there's not much point in later mounting it ro at all. Sorry, I didn’t understand that last paragraph. I thought, it’s desirable to first mount it ro, so fsck can run, and then remount it as read-writable? My use case is to boot without initramfs. But, now that I know, that `/etc/fstab` is there to stay, I know what to do. Kind regards, Paul ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Mounting / as writable without in `/etc/fstab`
On Mon, Nov 23, 2020 at 5:23 PM Paul Menzel < pmenzel+systemd-de...@molgen.mpg.de> wrote: > Dear systemd folks, > > > Is an entry for / in `/etc/fstab` still needed, or is there a systemd > way of doing it? > That *is* the systemd way -- the fstab entry will be read by systemd-remount-fs(8) and the new mount options applied. > > Installing Debian bullseye/testing with the Debian Installer, it creates > a GPT and `/etc/fstab`. [...] > Commenting out the entries for `/`, the root partition is mounted as > read-only. > > $ findmnt / > TARGET SOURCE FSTYPE OPTIONS > / /dev/nvme0n1p2 ext4 ro,relatime > > Shouldn’t it be mounted as writable? > No, if you had it initially mounted with 'ro' and did not leave any instructions for remounting, then it won't be remounted... > > $ sudo /lib/systemd/systemd-remount-fs > $ findmnt / > TARGET SOURCE FSTYPE OPTIONS > / /dev/nvme0n1p2 ext4 rw,relatime,errors=remount-ro > > The log says: > > [2.320133] systemd[179]: > /usr/lib/systemd/system-generators/systemd-gpt-auto-generator succeeded. > > I can work around it changing `ro` to `rw` on the Linux command line, > but I thought, it is possible without that. > I would say that having the initramfs directly mount the filesystem as rw is the *preferred method*, not a workaround... Of course it depends on how your distro's initramfs wants to work, but at least that's what Arch does -- since fsck is run from the initramfs, there's not much point in later mounting it ro at all. -- Mantas Mikulėnas ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel