Re: [systemd-devel] jio test results

2020-11-26 Thread Paul Menzel

Dear Vito,


Am 26.11.20 um 19:54 schrieb Vito Caputo:

On Thu, Nov 26, 2020 at 12:28:16PM +0100, Paul Menzel wrote:


[…]


Thank you. It builds fine on Debian Sid/unstable with GCC 10.2.0-19 (some
warnings), and the results are attached.


Any chance you could mail me the warnings?  I see none here, would
prefer a clean build for others.


Hopefully there won’t be line wrapping when viewed with `format=flowed`.

```
$ gcc --version
gcc (Debian 10.2.0-19) 10.2.0
[…]
$ make
[…]
  CC   jio-journals.o
journals.c: In function ‘got_hash_table_iter_object_header’:
journals.c:361:4: warning: converting a packed ‘HashedObjectHeader’ 
pointer (alignment 1) to a ‘DataObject’ pointer (alignment 8) may result 
in an unaligned pointer value [-Waddress-of-packed-member]

  361 |DataObject *data_object = (DataObject *)iter_object_header;
  |^~
In file included from journals.h:14,
 from journals.c:30:
upstream/journal-def.h:80:8: note: defined here
   80 | struct HashedObjectHeader {
  |^~
upstream/journal-def.h:95:8: note: defined here
   95 | struct DataObject DataObject__contents;
  |^~
journals.c:372:4: warning: converting a packed ‘HashedObjectHeader’ 
pointer (alignment 1) to a ‘FieldObject’ pointer (alignment 8) may 
result in an unaligned pointer value [-Waddress-of-packed-member]

  372 |FieldObject *field_object = (FieldObject *)iter_object_header;
  |^~~
In file included from journals.h:14,
 from journals.c:30:
upstream/journal-def.h:80:8: note: defined here
   80 | struct HashedObjectHeader {
  |^~
upstream/journal-def.h:105:8: note: defined here
  105 | struct FieldObject FieldObject__contents;
  |^~~
[…]
```


Kind regards,

Paul
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring

2020-11-26 Thread Vito Caputo
On Thu, Nov 26, 2020 at 07:58:38PM +0100, Lennart Poettering wrote:
> On Mi, 25.11.20 19:02, Vito Caputo (vcap...@pengaru.com) wrote:
> 
> > Hello systemd-devel,
> >
> > Recent discussion here about journal space consumption happened to
> > occur while I was exploring use of the new io_uring linux kernel
> > interface in combination with journal files.
> 
> I'd be really curious if a iouring based reader could outperform the
> mmap based ones. I have the suspicion that yes, since paging in stuff
> is slow, and doesn't really allow for reordering.
> 

Same here, I expect superior uncached performance, and either equal or
slightly worse cached performance.

So the worst case should be better, and it might be a worthwhile trade
off for that alone.  Also the programming style is inherently quite
different, not sure if it's a net win or loss there, probably depends
on the preference of who's stuck maintaining it.

I'm also curious to see if io_uring is efficient enough that one can
treat the page cache as the application cache, and completely skip
implementing (and allocating space for) any caching layer in the
userspace process.

In jio currently, all the tiny object header reads of `jio report
usage` are being done with discrete io_uring requests, there's no
equivalent to the mmap-cache found in systemd-journald.

This results in frequently crossing the application:kernel boundary,
granularly fetching journal data from the page cache.  It makes for a
*very* simple implementation, and the kind of thing you'd strongly try
to avoid when using something like plain read() syscalls if concerned
with performance.

My knee-jerk reaction to such an approach is it'd be crazy to not even
cache the entry arrays and some hot set of entries and data objects
local to the process during searches at least.  But starting with a
dumb implementation leaning hard on io_uring seems worth at least
measuring, and maybe it'd prove Good Enough (tm), while staying within
a much smaller address space.  And it's not like such an
implementation would be difficult to extend with some targeted caching
of objects where needed.

BTW, do you have any objections to jio discussions occuring on
systemd-devel?

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring

2020-11-26 Thread Lennart Poettering
On Mi, 25.11.20 19:02, Vito Caputo (vcap...@pengaru.com) wrote:

> Hello systemd-devel,
>
> Recent discussion here about journal space consumption happened to
> occur while I was exploring use of the new io_uring linux kernel
> interface in combination with journal files.

I'd be really curious if a iouring based reader could outperform the
mmap based ones. I have the suspicion that yes, since paging in stuff
is slow, and doesn't really allow for reordering.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] jio test results (was: [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring)

2020-11-26 Thread Vito Caputo
On Thu, Nov 26, 2020 at 12:28:16PM +0100, Paul Menzel wrote:
> [No idea, if I should carbon copy the list or not.]

I hope you don't mind, but I've CC'd the list in replying.

At this point it's helpful for others to see:

1. jio compiled and worked
2. jio didn't crash or rm -Rf / your system
3. what kind of information it produced

> 
> Am 26.11.20 um 04:02 schrieb Vito Caputo:
> > Hello systemd-devel,
> > 
> > Recent discussion here about journal space consumption happened to
> > occur while I was exploring use of the new io_uring linux kernel
> > interface in combination with journal files.
> > 
> > What began as a curiosity about this new kernel interface, and what it
> > would be like to program journal file processing with some kind of
> > continuation-passing style C code bolted onto it, evolved into
> > something already providing new visibility into journal-file space
> > utilization not AFAIK currently offered by systemd's journalctl (not
> > that it couldn't be added there too).
> > 
> > I've called this program jio, pronounced "jai-oh".
> 
> Thank you for writing and sharing this.
> 

You're welcome, it was fun to write. Thanks for being a guinea pig!

> > At this time jio implements three basic functions:
> > 
> > 1. `jio report usage`
> > 
> >  READ-ONLY
> > 
> >  Measures and reports space actually used by objects in all
> >  accessible journal files, classified by object type.
> > 
> > 
> > 2. `jio report tail-waste`
> > 
> >  READ-ONLY
> > 
> >  Measures and reports unused space allocated to the tail ends of
> >  all journal files, classified by journal state.  Journal state
> >  being Online, Offline, or Archived.
> 
> Thank you. It builds fine on Debian Sid/unstable with GCC 10.2.0-19 (some
> warnings), and the results are attached.
> 

Any chance you could mail me the warnings?  I see none here, would
prefer a clean build for others.



> Per-object-type usage:
>   UNUSED: [0] 0.00 B
> Data: [2404148] 261.85 MiB
>Field: [6491] 331.51 KiB
>Entry: [1521426] 712.38 MiB
>DataHashTable: [100] 355.56 MiB
>   FieldHashTable: [100] 521.88 KiB
>   EntryArray: [554339] 577.14 MiB
>  Tag: [0] 0.00 B
> Aggregate object usage: 1.86 GiB of 2.22 GiB spanning 100 journal files

> 

 

> 
> Totals:
>   Tail-waste by state:
>  Offline [3]: 7.23 MiB, 2% of all tail-waste
>   Online [1]: 7.37 MiB, 2% of all tail-waste
> Archived [96]: 339.82 MiB, 95% of all tail-waste
> 
>   Aggregate tail-waste: 354.42 MiB, 15% of 2.22 GiB spanning 100 journal 
> files

This is useful information.  On my system there's an accumulation of
Offline user journals accounting for a majority of the tail-waste,
which being Offline the `jio reclaim tail-waste` won't touch.  We need
more samples from the field, but this suggests it might be something
unique to my system.

Thanks again!
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd 247 released

2020-11-26 Thread systemd tag bot
 A new, official systemd release has just  been  tagged . Please download 
the tarball here:

https://github.com/systemd/systemd/archive/v247.tar.gz

Changes since the previous release:

* KERNEL API INCOMPATIBILITY: Linux 4.14 introduced two new uevents
  "bind" and "unbind" to the Linux device model. When this kernel
  change was made, systemd-udevd was only minimally updated to handle
  and propagate these new event types. The introduction of these new
  uevents (which are typically generated for USB devices and devices
  needing a firmware upload before being functional) resulted in a
  number of issues which we so far didn't address. We hoped the kernel
  maintainers would themselves address these issues in some form, but
  that did not happen. To handle them properly, many (if not most) udev
  rules files shipped in various packages need updating, and so do many
  programs that monitor or enumerate devices with libudev or sd-device,
  or otherwise process uevents. Please note that this incompatibility
  is not fault of systemd or udev, but caused by an incompatible kernel
  change that happened back in Linux 4.12, but is becoming more and
  more visible as the new uevents are generated by more kernel drivers.

  To minimize issues resulting from this kernel change (but not avoid
  them entirely) starting with systemd-udevd 247 the udev "tags"
  concept (which is a concept for marking and filtering devices during
  enumeration and monitoring) has been reworked: udev tags are now
  "sticky", meaning that once a tag is assigned to a device it will not
  be removed from the device again until the device itself is removed
  (i.e. unplugged). This makes sure that any application monitoring
  devices that match a specific tag is guaranteed to both see uevents
  where the device starts being relevant, and those where it stops
  being relevant (the latter now regularly happening due to the new
  "unbind" uevent type). The udev tags concept is hence now a concept
  tied to a *device* instead of a device *event* — unlike for example
  udev properties whose lifecycle (as before) is generally tied to a
  device event, meaning that the previously determined properties are
  forgotten whenever a new uevent is processed.

  With the newly redefined udev tags concept, sometimes it's necessary
  to determine which tags are the ones applied by the most recent
  uevent/database update, in order to discern them from those
  originating from earlier uevents/database updates of the same
  device. To accommodate for this a new automatic property CURRENT_TAGS
  has been added that works similar to the existing TAGS property but
  only lists tags set by the most recent uevent/database
  update. Similarly, the libudev/sd-device API has been updated with
  new functions to enumerate these 'current' tags, in addition to the
  existing APIs that now enumerate the 'sticky' ones.

  To properly handle "bind"/"unbind" on Linux 4.12 and newer it is
  essential that all udev rules files and applications are updated to
  handle the new events. Specifically:

  • All rule files that currently use a header guard similar to
ACTION!="add|change",GOTO="xyz_end" should be updated to use
ACTION=="remove",GOTO="xyz_end" instead, so that the
properties/tags they add are also applied whenever "bind" (or
"unbind") is seen. (This is most important for all physical device
types — those for which "bind" and "unbind" are currently
generated, for all other device types this change is still
recommended but not as important — but certainly prepares for
future kernel uevent type additions).

  • Similarly, all code monitoring devices that contains an 'if' branch
discerning the "add" + "change" uevent actions from all other
uevents actions (i.e. considering devices only relevant after "add"
or "change", and irrelevant on all other events) should be reworked
to instead negatively check for "remove" only (i.e. considering
devices relevant after all event types, except for "remove", which
invalidates the device). Note that this also means that devices
should be considered relevant on "unbind", even though conceptually
this — in some form — invalidates the device. Since the precise
effect of "unbind" is not generically defined, devices should be
considered relevant even after "unbind", however I/O errors
accessing the device should then be handled gracefully.

  

Re: [systemd-devel] [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring

2020-11-26 Thread Vito Caputo
On Thu, Nov 26, 2020 at 11:25:22AM +, Dave Howorth wrote:
> On Wed, 25 Nov 2020 19:02:21 -0800
> Vito Caputo  wrote:
> 
> > I've called this program jio, pronounced "jai-oh".
> 
> Silly question, but how is 'jai' pronounced?

The name is like saying the letters I O, with the J sound prefixed,
but not pronouncing the letter J distinctly like the I and O, let the
J blend into the I.  Then get a bit lazy with the separation of I and
O too so it all flows with no breaks.

A friend pointed out the similarity to the Chinese word Jiayou:
https://en.wikipedia.org/wiki/Jiayou

Which might actually be appropriate, considering the exploration
of io_uring performance aspect.  "Give it more gas!"

Cheers,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] jio is an experimental systemd-journald journal file tool utilizing io_uring

2020-11-26 Thread Dave Howorth
On Wed, 25 Nov 2020 19:02:21 -0800
Vito Caputo  wrote:

> I've called this program jio, pronounced "jai-oh".

Silly question, but how is 'jai' pronounced?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Mounting / as writable without in `/etc/fstab`

2020-11-26 Thread Paul Menzel

Dear Mantas,



Thank you for your detailed reply.


Am 26.11.20 um 09:12 schrieb Mantas Mikulėnas:

On Mon, Nov 23, 2020 at 5:23 PM Paul Menzel wrote:



Is an entry for / in `/etc/fstab` still needed, or is there a systemd
way of doing it?


That *is* the systemd way -- the fstab entry will be read by
systemd-remount-fs(8) and the new mount options applied.


Thank you. That wasn’t clear to me.


Installing Debian bullseye/testing with the Debian Installer, it creates
a GPT and `/etc/fstab`. [...]
Commenting out the entries for `/`, the root partition is mounted as
read-only.

  $ findmnt /
  TARGET SOURCE FSTYPE OPTIONS
  /  /dev/nvme0n1p2 ext4   ro,relatime

Shouldn’t it be mounted as writable?


No, if you had it initially mounted with 'ro' and did not leave any
instructions for remounting, then it won't be remounted...


Understood.



  $ sudo /lib/systemd/systemd-remount-fs
  $ findmnt /
  TARGET SOURCE FSTYPE OPTIONS
  /  /dev/nvme0n1p2 ext4   rw,relatime,errors=remount-ro


Sorry, that was my confusion. I checked it again, and I messed up during 
testing. This result was with the entry for / present in `/etc/fstab`, 
that means not commented out.



The log says:

  [2.320133] systemd[179]: 
/usr/lib/systemd/system-generators/systemd-gpt-auto-generator succeeded.

I can work around it changing `ro` to `rw` on the Linux command line,
but I thought, it is possible without that.


I would say that having the initramfs directly mount the filesystem as rw
is the *preferred method*, not a workaround... Of course it depends on how
your distro's initramfs wants to work, but at least that's what Arch does
-- since fsck is run from the initramfs, there's not much point in later
mounting it ro at all.


Sorry, I didn’t understand that last paragraph. I thought, it’s 
desirable to first mount it ro, so fsck can run, and then remount it as 
read-writable?


My use case is to boot without initramfs. But, now that I know, that 
`/etc/fstab` is there to stay, I know what to do.



Kind regards,

Paul
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Mounting / as writable without in `/etc/fstab`

2020-11-26 Thread Mantas Mikulėnas
On Mon, Nov 23, 2020 at 5:23 PM Paul Menzel <
pmenzel+systemd-de...@molgen.mpg.de> wrote:

> Dear systemd folks,
>
>
> Is an entry for / in `/etc/fstab` still needed, or is there a systemd
> way of doing it?
>

That *is* the systemd way -- the fstab entry will be read by
systemd-remount-fs(8) and the new mount options applied.


>
> Installing Debian bullseye/testing with the Debian Installer, it creates
> a GPT and `/etc/fstab`. [...]
> Commenting out the entries for `/`, the root partition is mounted as
> read-only.
>
>  $ findmnt /
>  TARGET SOURCE FSTYPE OPTIONS
>  /  /dev/nvme0n1p2 ext4   ro,relatime
>
> Shouldn’t it be mounted as writable?
>

No, if you had it initially mounted with 'ro' and did not leave any
instructions for remounting, then it won't be remounted...


>
>  $ sudo /lib/systemd/systemd-remount-fs
>  $ findmnt /
>  TARGET SOURCE FSTYPE OPTIONS
>  /  /dev/nvme0n1p2 ext4   rw,relatime,errors=remount-ro
>
> The log says:
>
>  [2.320133] systemd[179]:
> /usr/lib/systemd/system-generators/systemd-gpt-auto-generator succeeded.
>
> I can work around it changing `ro` to `rw` on the Linux command line,
> but I thought, it is possible without that.
>

I would say that having the initramfs directly mount the filesystem as rw
is the *preferred method*, not a workaround... Of course it depends on how
your distro's initramfs wants to work, but at least that's what Arch does
-- since fsck is run from the initramfs, there's not much point in later
mounting it ro at all.

-- 
Mantas Mikulėnas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel