Re: [systemd-devel] systemd hackfest talking points

2018-10-03 Thread vcaputo
On Wed, Oct 03, 2018 at 08:01:19AM +, Zbigniew Jędrzejewski-Szmek wrote:
> Hi,
> 
> we had a systemd hackfest/talkfest last Sunday in Berlin as part of 
> AllSystemdGo 2018.
> 
> Here is a copy of the doc we used to discuss the technical &
> documentations topics:
> https://docs.google.com/document/d/12mWXZem7IOc9u-Db04Cy-NPi39LijiiMlTqX-0DYO98/edit
> 
> (I tried to convert this to text, but there's a lot of markup.)
> 

Thanks for sharing this.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] journald fsync() errors

2018-04-03 Thread vcaputo
Back when I worked on making fsync() in journald asynchronous, I
preserved the existing strategy of ignoring fsync() errors.

In reading [1], I am reminded of this situation and am again wondering
why this is the case.  Shouldn't journald trigger a journal rotate when
fsync() realizes an IO error, marking the previous journal as corrupt?

Can someone remind me of the rationale behind the existing approach?

Regards,
Vito Caputo

[1] 
https://www.postgresql.org/message-id/flat/CAMsr%2BYE5Gs9iPqw2mQ6OHt1aC5Qk5EuBFCyG%2BvzHun1EqMxyQg%40mail.gmail.com
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-journald may crash during memory pressure

2018-02-10 Thread vcaputo
On Sat, Feb 10, 2018 at 03:05:16PM +0100, Kai Krakow wrote:
> Am Sat, 10 Feb 2018 14:23:34 +0100 schrieb Kai Krakow:
> 
> > Am Sat, 10 Feb 2018 02:16:44 +0200 schrieb Uoti Urpala:
> > 
> >> On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote:
> >>> This last log lines indicates journald wasn't scheduled for a long
> >>> time which caused the watchdog to hit and journald was aborted.
> >>> Consider increasing the watchdog timeout if your system is indeed that
> >>> loaded and that's is supposed to be an OK thing...
> >> 
> >> BTW I've seen the same behavior on a system with a single active
> >> process that uses enough memory to trigger significant swap use. I
> >> wonder if there has been a regression in the kernel causing misbehavior
> >> when swapping? The problems aren't specific to journald - desktop
> >> environment can totally freeze too etc.
> > 
> > This problem seems to be there since kernel 4.9 which was a real pita in
> > this regard. It's progressively becoming better since kernel 4.10. The
> > kernel seems trying to prevent swapping at any cost since then, at least
> > at the cost of much higher latency, and at the cost of pushing all cache
> > out of RAM.
> > 
> > The result is processes stuck for easily 30 seconds and more during
> > memory pressure. Sometimes I see the kernel loudly complaining in dmesg
> > about high wait times for allocating RAM, especially from the btrfs
> > module. Thus, the biggest problem may be that kernel threads itself get
> > stuck in memory allocations and are a victim of high latency.
> > 
> > Currently I'm running my user session in a slice with max 80% RAM which
> > seems to help. It helps not discarding all cache. I also put some
> > potentially high memory users (regarding cache and/or resident mem) into
> > slices with carefully selected memory limits (backup and maintenance
> > services). Slices limited in such a way will start swapping before cache
> > is discarded and everything works better again. Part of this problem may
> > be that I have one process running which mmaps and locks 1G of memory
> > (bees, a btrfs deduplicator).
> > 
> > This system has 16G of RAM which is usually plenty but I use tmpfs to
> > build packages in Gentoo, and while that worked wonderfully before 4.9,
> > I have to be really careful now. The kernel happily throws away cache
> > instead of swapping early. Setting vm.swappiness differently seems to
> > have no perceivable effect.
> > 
> > Software that uses mmap is the first latency victim of this new
> > behavior.
> > As such, also systemd-journald seems to be hit hard by this.
> > 
> > After the system recovered from high memory pressure (which can take
> > 10-15 minutes, resulting in a loadavg of 400+), it ends up with some
> > gigabytes of inactive memory in the swap which it will only swap back in
> > then during shutdown (which will also take some minutes then).
> > 
> > The problem since 4.9 seems to be that the kernel tends to do swap
> > storms instead of constantly swapping out memory at low rates during
> > usage. The swap storms totally thrash the system.
> > 
> > Before 4.9, the kernel had no such latency spikes under memory pressure.
> > Swap would usually grew slowly over time, and the system felt sluggish
> > one or another time but still usable wrt latency. I usually ended up
> > with 5-8G of swap usage, and that was no problem. Now, swap only
> > significantly grows during swap storms with an unusable system for many
> > minutes, with latencies of 10+ seconds around twice per minute.
> > 
> > I had no swap storm yet since the last boot, and swap usage is around
> > 16M now. Before kernel 4.9, this would be much higher already.
> 
> After some more research, I found that vm.watermark_scale_factor may be 
> the knob I am looking for. I'm going to watch behavior now with a higher 
> factor (default = 10, now 200).
> 

Have you reporteed this to the kernel maintainers?  LKML?

While this is interesting to read on systemd-devel, it's not right
venue.  What you describe sounds like a regression that probably should
be improved upon.

Also, out of curiosity, are you running dmcrypt in this scenario?  If
so, is swap on dmcrypt as well?

Regards,
Vito Caputo


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-journald may crash during memory pressure

2018-02-08 Thread vcaputo
Note the logs you've pasted portray a watchdog timeout which resulted in
SIGABRT and a subsequent core dump.

This is not really a journald "crash", and you can increase the watchdog
timeout or disable it entirely to make it more tolerant of thrashing.

What I presume happened is the system was thrashing and a page fault in
the mapped journal took too long to complete.

Regards,
Vito Caputo


On Thu, Feb 08, 2018 at 11:50:45PM +0100, Kai Krakow wrote:
> Hello!
> 
> During memory pressure and/or high load, journald may crash. This is
> probably due to design using mmap but it should really not do this.
> 
> On 32-bit systems, we are seeing such crashes constantly although the
> available memory is still gigabytes (it's a 32-bit userland running in a
> 64-bit kernel).
> 
> 
> [82988.670323] systemd[1]: systemd-journald.service: Main process exited, 
> code=dumped, status=6/ABRT
> [82988.670684] systemd[1]: systemd-journald.service: Failed with result 
> 'watchdog'.
> [82988.685928] systemd[1]: systemd-journald.service: Service has no hold-off 
> time, scheduling restart.
> [82988.709575] systemd[1]: systemd-journald.service: Scheduled restart job, 
> restart counter is at 2.
> [82988.717390] systemd[1]: Stopped Flush Journal to Persistent Storage.
> [82988.717411] systemd[1]: Stopping Flush Journal to Persistent Storage...
> [82988.726303] systemd[1]: Stopped Journal Service.
> [82988.844462] systemd[1]: Starting Journal Service...
> [82993.633781] systemd-coredump[22420]: MESSAGE=Process 461 (systemd-journal) 
> of user 0 dumped core.
> [82993.633811] systemd-coredump[22420]: Coredump diverted to 
> /var/lib/systemd/coredump/core.systemd-journal.0.3d492c866f254fb981f916c6c3918046.461.151812537700.lz4
> [82993.633813] systemd-coredump[22420]: Stack trace of thread 461:
> [82993.633814] systemd-coredump[22420]: #0  0x7f940241d4dd 
> journal_file_move_to_object (libsystemd-shared-237.so)
> [82993.633815] systemd-coredump[22420]: #1  0x7f940241e910 
> journal_file_find_data_object_with_hash (libsystemd-shared-237.so)
> [82993.633816] systemd-coredump[22420]: #2  0x7f940241fe81 
> journal_file_append_data (libsystemd-shared-237.so)
> [82993.633817] systemd-coredump[22420]: #3  0x556a343ae9ea 
> write_to_journal (systemd-journald)
> [82993.633819] systemd-coredump[22420]: #4  0x556a343b0974 
> server_dispatch_message (systemd-journald)
> [82993.633820] systemd-coredump[22420]: #5  0x556a343b24bb 
> stdout_stream_log (systemd-journald)
> [82993.633821] systemd-coredump[22420]: #6  0x556a343b2afe 
> stdout_stream_line (systemd-journald)
> [82993.723157] systemd-coredum: 7 output lines suppressed due to ratelimiting
> [83002.830610] systemd-journald[22424]: File 
> /var/log/journal/121b87ca633e8ac001665668001b/system.journal corrupted or 
> uncleanly shut down, renaming and replacing.
> [83014.774538] systemd[1]: Started Journal Service.
> [83119.277143] systemd-journald[22424]: File 
> /var/log/journal/121b87ca633e8ac001665668001b/user-500.journal corrupted 
> or uncleanly shut down, renaming and replacing.
> 
> 
> -- 
> Regards,
> Kai
> 
> Replies to list-only preferred.
> 
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] hash context not closed.

2018-02-03 Thread vcaputo
On Sat, Feb 03, 2018 at 09:14:40PM +0100, Stef Bon wrote:
> The testing of the new code requires me too much (meson is required..
> and I don't want to replace my existing systemd) and I do not expect
> to add more patches is near future.
> 
> I've got a patch:
> 
> diff --git a/src/basic/gcrypt-util.c b/src/basic/gcrypt-util.c
> index 1bfb77672..c7c07e3b7 100644
> --- a/src/basic/gcrypt-util.c
> +++ b/src/basic/gcrypt-util.c
> @@ -46,6 +46,7 @@ int string_hashsum(const char *s, size_t len, int
> md_algorithm, char **out) {
>  size_t hash_size;
>  void *hash;
>  char *enc;
> +int res=-EIO;
> 
>  initialize_libgcrypt(false);
> 
> @@ -59,14 +60,24 @@ int string_hashsum(const char *s, size_t len, int
> md_algorithm, char **out) {
>  gcry_md_write(md, s, len);
> 
>  hash = gcry_md_read(md, 0);
> +
>  if (!hash)
> -return -EIO;
> +goto closemd;
> 
> +res = -ENOMEM;
>  enc = hexmem(hash, hash_size);
> -if (!enc)
> -return -ENOMEM;
> 
> -*out = enc;
> -return 0;
> +if (enc) {
> +
> +*out = enc;
> +res = 0;
> +
> +}
> +
> +closemd:
> +
> +gcry_md_close(md);
> +
> +return res;
>  }
>  #endif
> 

I'm going to be offline for a few days, so hopefully someone else on the
list will take over here.

Cheers,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] hash context not closed.

2018-02-03 Thread vcaputo
On Sat, Feb 03, 2018 at 08:38:38PM +0100, Stef Bon wrote:
> 2018-02-03 20:27 GMT+01:00  :
> 
> >
> > Nice catch, that indeed looks like a leak.  Would you like to prep a
> > PR fixing it?
> >
> 
> PR? I do not know what you mean. A pull request?
> I can write a patch.

The systemd project uses github.  You can create a free github account
if you don't already have one, clone the systemd project to your
account, then push your patch to your cloned repo on github using git.

Once pushed, you create a pull request (PR) of your changes on github
for the systemd project, where we'll be able to review and merge it in
the normal workflow.

If you haven't done this before, it can seem a bit onerous at first.
It's not that bad though, and you'll then be positioned to readily
participate in the abundance of open development occurring on the
platform, in addition to future systemd contributions.

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] hash context not closed.

2018-02-03 Thread vcaputo
On Sat, Feb 03, 2018 at 07:26:24PM +0100, Stef Bon wrote:
> Hi,
> 
> when I look at the function string_hashsum it looks like the context
> is not closed when done.
> 
> After a succesfull gcry_md_open the hash context md should be closed
> when leaving this function. That does not happen.
> 

Nice catch, that indeed looks like a leak.  Would you like to prep a
PR fixing it?

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to make journald increase logs storage capacity?

2018-01-21 Thread vcaputo
On Sat, Jan 20, 2018 at 08:28:21PM +0300, 4xy wrote:
> I have limit in 4Gb for journald regardless of the settings I've made. I see
> this line in the manual
> 
> |$ sudo systemctl status systemd-journald ... jan 20 15:44:26 host
> systemd-journald[1218]: System journal (/var/log/journal/) is 4.5G, max
> 4.0G, 0B free. jan 20 15:44:26 host systemd-journald[1218]: Journal started|
> 
> I see this line in the manual.
> 
> > The first pair defaults to 10%  and the second to 15% of the size of the
> > respective file system, but
> each value is capped to 4G.
> 

This is referencing the defaults configured automatically, they're first
computed as a %age, confined to 4G.  The wording of that sentence could
probably be improved in the documentaiton.

> I have about 1Gb logs per day. I would like to have ability to keep them
> about 3 months.
> 

It sounds like you want to keep around 100G of logs:

SystemMaxUse=100G


Note that performance of commands like `journalctl` and even `systemctl
status $unit` drops significantly with large numbers of journals.
As a consequence, you may want to increase the maximum size of the
individual journal files so there are fewer of them:

SystemMaxFileSize=10G

Something like that...

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemctl is-active

2018-01-15 Thread vcaputo
On Tue, Jan 16, 2018 at 12:17:52AM +0100, Michał Zegan wrote:
> Hello.
> When systemctl is-active was added?
> I need this to check if I can safely use it to check if unit is active
> if I don't have prior knowledge about the systemd version I am running
> this on, this is for an ansible playbook.
> 

It appears "is-active" was added to systemctl as part of:

 commit ee5762e3780c048b230e8c1e7659e40fc1f443bf
 Author: Lennart Poettering 
 Date:   Sat Jul 24 00:53:33 2010 +0200
 
 systemctl: fold systemd-install into systemctl
 

The earliest tagged release this commit made it into according to
`git tag --contains ee5762e378` is v4.

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Illegal CPUID instruction causes systemd core dump

2017-12-28 Thread vcaputo
On Thu, Dec 28, 2017 at 10:08:59PM +0100, Lennart Poettering wrote:
> On Do, 28.12.17 20:26, Reindl Harald (h.rei...@thelounge.net) wrote:
> 
> > 
> > 
> > Am 28.12.2017 um 20:07 schrieb tedheadster:
> > > I am doing regression testing on old hardware. systemd-233 just
> > > generated the following error on startup:
> > > 
> > > I believe it is getting an illegal instruction trap on this first
> > > generation 486 because it is calling "cpuid" in detect_vm_cpuid()
> > > without first checking if the hardware supports it; it doesn't in this
> > > case.
> > > 
> > > The gcc compiler provides a workaround in the cpuid.h header file. You
> > > can call __get_cpuid_max() first and check the return value > 0.
> > > 
> > > https://stackoverflow.com/questions/14266772/how-do-i-call-cpuid-in-linux#14266932
> > > 
> > > The Linux kernel still supports the 486 so we have to code around this
> > > case, even if it is ancient hardware
> > 
> > don't get me wrong - i am for 15 years now in the IT and my first PC in 1999
> > was a i686
> > 
> > i don't see how a brand new systemd and a mordern userland is supposed to
> > run on 20 years or older hardware where nearly eveyr distribution these days
> > is i586 or i686 only or starts to drop 32bit at all
> 
> Well, we carry compat code for m68k, hence I figure i486 support
> should be OK to have, too. I figure m68k processors are even more
> legacy than i486 is, and certainly require more porting work than i486
> does, hence i486 support should be fine to have by a long shot.
> 

Additionally Intel was still manufacturing 386 and 486 chips as recently
as 2007 [1].  Why not maintain support for such systems?  It shouldn't
be a substantial burden with a codebase that already supports 32-bit in
general.

Regards,
Vito Caputo

[1] https://www.theregister.co.uk/2006/05/18/intel_cans_386_486_960_cpus/
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How do I remove dead process managed by Systemd?

2017-12-04 Thread vcaputo
On Mon, Dec 04, 2017 at 07:14:51PM +0100, Lennart Poettering wrote:
> On Mo, 04.12.17 23:15, Yun-Chih Chen (yunchih@gmail.com) wrote:
> 
> > Hi, fellows:
> > 
> > How do I remove a dead process (D-state process) wrapped as a Systemd
> > service unit?  The process's parent is systemd (pid=1) and I still did not
> > find a way to remove such process without rebooting.  I have tried the
> > following:
> > 
> > $ systemctl restart   # timeout
> > $ systemctl daemon-reexec
> > $ kill -9  # won't work of course
> > 
> > Or anyone knows of hacks that do the trick?
> 
> You cannot kill "D" state processes on Linux. That's what "D" state
> means ultimately: *uninterruptible* sleep. The kill command can't kill
> such processes, and neither can systemd. If you see processes like
> this, then either you have very slow I/O, or this points to some form
> of kernel/driver problem, as sleeping in this state should usually be
> for short periods only.
> 

That's not strictly accurate AFAIK.

Not all D state processes are unkillable.  D state contributes to load
average, so it's not just to represent unkillable tasks.  IIRC it's
ambiguous from userspace by just glancing at D state if the process is
killable, but it can happen that the process is plain uninterruptible.

I don't have time at the moment to do much digging to confirm.  If you
glance at wait.h from the kernel headers, see wait_event_killable().
TASK_KILLABLE includes TASK_UNINTERRUPTIBLE, and I believe that shows as a
D state in userspace, but it also includes TASK_WAKEKILL.

What Lennart describes still holds true for Yun-Chih; if the process is in
an unkillable state then there's nothing systemd can do.

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Outputting STDOUT and STDERR to current $SSH_TTY

2017-08-11 Thread vcaputo
This is not recommended.  Forwarding is currently implemented
synchronously in journald, and can have a very substantial negative
impact on journald performance in general as a result.

BTW: are we not averse to top-posting on systemd-devel?

Regards,
Vito Caputo


On Fri, Aug 11, 2017 at 05:47:15AM +, Zhujiafa wrote:
> add
> " systemd.journald.forward_to_console=1 "
>  to your kernel  parameters.
> 
> 
> 
> 
> ZhuJiafa
> email: zhuji...@huawei.com
> tel: +86 18651652661
> 
> From: Reindl Harald
> Date: 2017-08-11 13:37
> To: 
> systemd-devel@lists.freedesktop.org
> Subject: Re: [systemd-devel] Outputting STDOUT and STDERR to current $SSH_TTY
> 
> 
> 
> Am 11.08.2017 um 04:29 schrieb Sergei Franco:
> > I am trying to retain behaviour of the startup scripts the same as sysvinit.
> > For example, on sysv the script outputs feedback to the console when the
> > stopping and starting is done via console.
> > With systemd all output is "hidden", and to see the output you need to
> > check the journal (or syslog if configured). There is no feedback.
> >
> > Is it possible to configure StandardOutput=tty (and StandardErrror)
> > where TTYPath= is set to current $SSH_TTY env (be that
> > /devpts/0,/dev/pts/1 or /dev/pts/n)?
> 
> no, because PID1 is a completly different process than any TTY
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel

> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Hashmap iteration is too costly

2017-07-17 Thread vcaputo
On Mon, Jul 17, 2017 at 10:12:12AM +0200, Lennart Poettering wrote:
> On Fri, 14.07.17 03:18, vcap...@pengaru.com (vcap...@pengaru.com) wrote:
> 
> > The current hashmap iteration incurs at least one function call per
> > iteration, and in my observations using gcc 6.3 w/-g -O2, it's two:
> > 
> >  internal_hashmap_iterate()
> >  hashmap_iterate_entry()
> > 
> > for every element in the hashmap.
> > 
> > In profiles of `journalctl -b --no-pager` having multiple (50) log files,
> > the hashmap iteration shows up as ~10% of the CPU in callgrind
> > output.
> 
> I wonder if it wouldn't be easiest to simply bypass hashmap native
> iteration for this case. i.e. add a linked list for the open journal
> files which we can easily traverse in O(1) time?
> 

What part of hashmap_iterate_in_insertion_order() grows in complexity with
increasing N?

> The hashmap implementation has been optimized to require very little
> memory, because we have so many hashmap/sets around in PID 1. It's not
> optimized to make iteration fast though.
> 

I don't believe these are mutually exclusive.

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Hashmap iteration is too costly

2017-07-14 Thread vcaputo
The current hashmap iteration incurs at least one function call per
iteration, and in my observations using gcc 6.3 w/-g -O2, it's two:

 internal_hashmap_iterate()
 hashmap_iterate_entry()

for every element in the hashmap.

In profiles of `journalctl -b --no-pager` having multiple (50) log files,
the hashmap iteration shows up as ~10% of the CPU in callgrind output.

Since it appears effort has been made to keep the hashmap structure opaque,
making these iterators zero-cost seems non-trivial.

As an experiment, I added two new hashmap functions:

 hashmap_get_array()
 ordered_hashmap_get_array()

These return an allocated array snapshot of the hashmap's contents, in the
same order they would be iterated in.  Their implementation has direct
access to the hashmap internals, and uses inlined versions of the
type-specific iterator functions to avoid the per-element function call
overhead of the usual iterators.

You can find the code here:

https://github.com/systemd/systemd/compare/303608...vcaputo:hashmap_get_array

The last commit includes some numbers, which I'll include them here for
convenience:

Before:

  # time ./journalctl -b -1 --no-pager > /dev/null
   real0m16.452s
   user0m16.371s
   sys 0m0.066s

  # time ./journalctl -b -1 --no-pager > /dev/null
   real0m16.375s
   user0m16.310s
   sys 0m0.057s

  # time ./journalctl -b -1 --no-pager > /dev/null
   real0m16.390s
   user0m16.329s
   sys 0m0.047s

 After:

  # time ./journalctl -b -1 --no-pager > /dev/null
   real0m15.827s
   user0m15.769s
   sys 0m0.052s

  # time ./journalctl -b -1 --no-pager > /dev/null
   real0m15.650s
   user0m15.592s
   sys 0m0.050s

  # time ./journalctl -b -1 --no-pager > /dev/null
   real0m15.646s
   user0m15.574s
   sys 0m0.056s


Note this is adding the cost of a malloc and free, and copying the
elements.  It's still faster than the existing hashmap iterator.

I'm not sold on this being a good solution, it's more meant to shed light
on what I feel is a bit of a wart and hopefully start some discussion.

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-10 Thread vcaputo
On Sat, Jul 08, 2017 at 03:49:11AM +, Zbigniew Jędrzejewski-Szmek wrote:
> On Fri, Jul 07, 2017 at 03:54:09PM -0700, vcap...@pengaru.com wrote:
> > On Fri, Jul 07, 2017 at 10:34:22PM +, Zbigniew Jędrzejewski-Szmek wrote:
> > > On Fri, Jul 07, 2017 at 02:35:16PM -0700, vcap...@pengaru.com wrote:
> > > > On Fri, Jul 07, 2017 at 01:49:54PM -0700, vcap...@pengaru.com wrote:
> > > > > On Fri, Jul 07, 2017 at 08:37:08PM +, Mantas Mikulėnas wrote:
> > > > > > Back when that commit was made, didn't glibc cache the getpid() 
> > > > > > result in
> > > > > > userspace? That would explain why it was not noticed.
> > > > >
> > > > > Hmm, this crossed my mind, and come to think of it I did a 
> > > > > dist-upgrade
> > > > > from Debian jessie to stretch overnight machine and haven't rebooted.
> > > > > 
> > > > > Perhaps the vdso isn't working and the costly getpid() is a red 
> > > > > herring, will
> > > > > reboot and retest to confirm.
> > > > > 
> > > > 
> > > > It appears Debian has a glibc patch to disable the caching (I was 
> > > > unaware
> > > > such an elaborate dance was being performed to cache this!)
> > > > 
> > > > https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba
> > > 
> > > Do we know the justification for this patch?
> > > 
> > 
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909
> > 
> > Which references this upstream glibc bug:
> > 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=19957
> > https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d
> > 
> > 
> > > > Unsure where I stand on core system software assuming certain syscalls 
> > > > are
> > > > always going to be exceptionally cheap though...
> > > 
> > > Optimization is never in a vacuum. If glibc does something cheaply, it
> > > seems reasonable to take advantage of it.
> > > 
> > 
> > Except there's always a risk of these things regressing to normal syscalls,
> > and one has to weigh the utility against that.  It's unclear to me what
> > significant utility having the sd-journal API police changing pids by
> > calling getpid() at every public entrypoint is bringing to the table.
> 
> So it seems the issue has been fixed in glibc upstream more than a year
> ago, and it doesn't seem to make sense to optimize current systemd git for
> that.
> 

Can you provide a commit id?  I took a glance at sourcewaire.org/git/gitweb.cgi
for getpid commits and didn't see anything relevant since the removal[1].


> I see the argument that the getpid() checks are a bit excessive. Is their
> overhead actually noticeable with current glibc?
> 

On my spare arch system I still see gratuitous getpid() calls from
journalctl, which is on glibc 2.5-2.

The pollution of strace output alone due to these checks is nuisance enough
for me to want the checks removed, considering their only value is to catch
programmer errors.  There's an abundance of potential programmer errors
we're not making any effort to prevent, why is this one so privileged that
it warrants policing?

I appreciate Lennart's point about the hazards of forking from threaded
programs.  It just doesn't seem like a valid rationalization for sprinkling
a system library's entrypoints with getpid() calls to catch this in
production.

Considering the associated potential costs, and the historic controversy
surrounding the caching of this particular syscall[2] I'm a bit confused by
the status quo.

Cheers,
Vito Caputo

1: 
https://sourceware.org/git/gitweb.cgi?p=glibc.git=search=HEAD=commit=getpid
2: http://yarchive.net/comp/linux/getpid_caching.html

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-07 Thread vcaputo
On Fri, Jul 07, 2017 at 10:34:22PM +, Zbigniew Jędrzejewski-Szmek wrote:
> On Fri, Jul 07, 2017 at 02:35:16PM -0700, vcap...@pengaru.com wrote:
> > On Fri, Jul 07, 2017 at 01:49:54PM -0700, vcap...@pengaru.com wrote:
> > > On Fri, Jul 07, 2017 at 08:37:08PM +, Mantas Mikulėnas wrote:
> > > > Back when that commit was made, didn't glibc cache the getpid() result 
> > > > in
> > > > userspace? That would explain why it was not noticed.
> > >
> > > Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
> > > from Debian jessie to stretch overnight machine and haven't rebooted.
> > > 
> > > Perhaps the vdso isn't working and the costly getpid() is a red herring, 
> > > will
> > > reboot and retest to confirm.
> > > 
> > 
> > It appears Debian has a glibc patch to disable the caching (I was unaware
> > such an elaborate dance was being performed to cache this!)
> > 
> > https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba
> 
> Do we know the justification for this patch?
> 

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909

Which references this upstream glibc bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=19957
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d


> > Unsure where I stand on core system software assuming certain syscalls are
> > always going to be exceptionally cheap though...
> 
> Optimization is never in a vacuum. If glibc does something cheaply, it
> seems reasonable to take advantage of it.
> 

Except there's always a risk of these things regressing to normal syscalls,
and one has to weigh the utility against that.  It's unclear to me what
significant utility having the sd-journal API police changing pids by
calling getpid() at every public entrypoint is bringing to the table.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-07 Thread vcaputo
On Fri, Jul 07, 2017 at 01:49:54PM -0700, vcap...@pengaru.com wrote:
> On Fri, Jul 07, 2017 at 08:37:08PM +, Mantas Mikulėnas wrote:
> > Back when that commit was made, didn't glibc cache the getpid() result in
> > userspace? That would explain why it was not noticed.
>
> Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
> from Debian jessie to stretch overnight machine and haven't rebooted.
> 
> Perhaps the vdso isn't working and the costly getpid() is a red herring, will
> reboot and retest to confirm.
> 

It appears Debian has a glibc patch to disable the caching (I was unaware
such an elaborate dance was being performed to cache this!)

https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba

Unsure where I stand on core system software assuming certain syscalls are
always going to be exceptionally cheap though...
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-07 Thread vcaputo
On Fri, Jul 07, 2017 at 01:49:54PM -0700, vcap...@pengaru.com wrote:
> On Fri, Jul 07, 2017 at 08:37:08PM +, Mantas Mikulėnas wrote:
> > Back when that commit was made, didn't glibc cache the getpid() result in
> > userspace? That would explain why it was not noticed.
>
> Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
> from Debian jessie to stretch overnight machine and haven't rebooted.
> 
> Perhaps the vdso isn't working and the costly getpid() is a red herring, will
> reboot and retest to confirm.

Nope, same getpid() performance hit after rebooting.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-07 Thread vcaputo
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.

Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.


On Fri, Jul 07, 2017 at 08:37:08PM +, Mantas Mikulėnas wrote:
> Back when that commit was made, didn't glibc cache the getpid() result in
> userspace? That would explain why it was not noticed.
> 
> On Fri, Jul 7, 2017, 23:18  wrote:
> 
> > In doing some casual journalctl profiling and stracing, it became apparent
> > that `journalctl -b --no-pager` runs across a significant quantity of logs,
> > ~10% of the time was thrown away on getpid() calls due to commmit a65f06b.
> >
> > As-is:
> >  # time ./journalctl -b --no-pager > /dev/null
> >
> >  real0m11.033s
> >  user0m10.084s
> >  sys 0m0.943s
> >
> >
> > After changing journal_pid_changed() to simply return 1:
> >  # time ./journalctl -b --no-pager > /dev/null
> >
> >   real0m9.641s
> >   user0m9.449s
> >   sys 0m0.191s
> >
> >
> > More system time is being expended in repeated getpid() calls than write(),
> > see the strace:
> >
> >  12:51:56.939287 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57
> > <0.001276>
> >  12:51:56.940633 getpid()= 10713 <0.32>
> >  12:51:56.940732 getpid()= 10713 <0.12>
> >  12:51:56.940801 getpid()= 10713 <0.32>
> >  12:51:56.940867 getpid()= 10713 <0.41>
> >  12:51:56.940942 getpid()= 10713 <0.41>
> >  12:51:56.941047 getpid()= 10713 <0.12>
> >  12:51:56.941117 getpid()= 10713 <0.12>
> >  12:51:56.941185 getpid()= 10713 <0.11>
> >  12:51:56.941253 getpid()= 10713 <0.11>
> >  12:51:56.941320 getpid()= 10713 <0.39>
> >  12:51:56.941395 getpid()= 10713 <0.41>
> >  12:51:56.941494 getpid()= 10713 <0.11>
> >  12:51:56.941561 getpid()= 10713 <0.12>
> >  12:51:56.941629 getpid()= 10713 <0.39>
> >  12:51:56.942942 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57
> > <0.58>
> >  12:51:56.943052 getpid()= 10713 <0.39>
> >  12:51:56.943156 getpid()= 10713 <0.17>
> >  12:51:56.943230 getpid()= 10713 <0.18>
> >  12:51:56.943305 getpid()= 10713 <0.12>
> >  12:51:56.943374 getpid()= 10713 <0.17>
> >  12:51:56.943449 getpid()= 10713 <0.11>
> >  12:51:56.943517 getpid()= 10713 <0.11>
> >  12:51:56.943585 getpid()= 10713 <0.11>
> >  12:51:56.943652 getpid()= 10713 <0.11>
> >  12:51:56.943721 getpid()= 10713 <0.30>
> >  12:51:56.943796 getpid()= 10713 <0.41>
> >  12:51:56.943870 getpid()= 10713 <0.41>
> >  12:51:56.943944 getpid()= 10713 <0.41>
> >  12:51:56.944061 getpid()= 10713 <0.001334>
> >  12:51:56.945459 write(1, "Jul 07 09:25:23 x61s unknown"..., 56) = 56
> > <0.18>
> >  12:51:56.945544 getpid()= 10713 <0.17>
> >  12:51:56.945620 getpid()= 10713 <0.17>
> >  12:51:56.945694 getpid()= 10713 <0.12>
> >  12:51:56.945763 getpid()= 10713 <0.11>
> >  12:51:56.945832 getpid()= 10713 <0.12>
> >  12:51:56.945901 getpid()= 10713 <0.11>
> >  12:51:56.945969 getpid()= 10713 <0.11>
> >  12:51:56.946048 getpid()= 10713 <0.13>
> >  12:51:56.946118 getpid()= 10713 <0.24>
> >  12:51:56.946188 getpid()= 10713 <0.47>
> >  12:51:56.946277 getpid()= 10713 <0.41>
> >  12:51:56.946353 getpid()= 10713 <0.41>
> >  12:51:56.946428 getpid()= 10713 <0.40>
> >  12:51:56.946539 getpid()= 10713 <0.001363>
> >
> > As this is public sd-journal API, it's somewhat set in stone.  However,
> > there's nothing preventing the systemd-internal tooling from linking with
> > a less defensive/conformant underlying implementation shared with the
> > public
> > API implementation where these kinds of overheads can be preserved.
> >
> > For the curious; the logs being processed for this boot are 48 * 8MiB on
> > SSD,
> > 1.8Ghz Core2 Duo, 4.12 kernel.
> >
> > Regards,
> > Vito Caputo
> > ___
> > systemd-devel mailing list
> > systemd-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/systemd-devel
> >
> -- 
> 
> Mantas Mikulėnas 
> Sent from my phone
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org

[systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-07 Thread vcaputo
In doing some casual journalctl profiling and stracing, it became apparent
that `journalctl -b --no-pager` runs across a significant quantity of logs,
~10% of the time was thrown away on getpid() calls due to commmit a65f06b.

As-is:
 # time ./journalctl -b --no-pager > /dev/null

 real0m11.033s
 user0m10.084s
 sys 0m0.943s


After changing journal_pid_changed() to simply return 1:
 # time ./journalctl -b --no-pager > /dev/null

  real0m9.641s
  user0m9.449s
  sys 0m0.191s


More system time is being expended in repeated getpid() calls than write(),
see the strace:

 12:51:56.939287 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.001276>
 12:51:56.940633 getpid()= 10713 <0.32>
 12:51:56.940732 getpid()= 10713 <0.12>
 12:51:56.940801 getpid()= 10713 <0.32>
 12:51:56.940867 getpid()= 10713 <0.41>
 12:51:56.940942 getpid()= 10713 <0.41>
 12:51:56.941047 getpid()= 10713 <0.12>
 12:51:56.941117 getpid()= 10713 <0.12>
 12:51:56.941185 getpid()= 10713 <0.11>
 12:51:56.941253 getpid()= 10713 <0.11>
 12:51:56.941320 getpid()= 10713 <0.39>
 12:51:56.941395 getpid()= 10713 <0.41>
 12:51:56.941494 getpid()= 10713 <0.11>
 12:51:56.941561 getpid()= 10713 <0.12>
 12:51:56.941629 getpid()= 10713 <0.39>
 12:51:56.942942 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.58>
 12:51:56.943052 getpid()= 10713 <0.39>
 12:51:56.943156 getpid()= 10713 <0.17>
 12:51:56.943230 getpid()= 10713 <0.18>
 12:51:56.943305 getpid()= 10713 <0.12>
 12:51:56.943374 getpid()= 10713 <0.17>
 12:51:56.943449 getpid()= 10713 <0.11>
 12:51:56.943517 getpid()= 10713 <0.11>
 12:51:56.943585 getpid()= 10713 <0.11>
 12:51:56.943652 getpid()= 10713 <0.11>
 12:51:56.943721 getpid()= 10713 <0.30>
 12:51:56.943796 getpid()= 10713 <0.41>
 12:51:56.943870 getpid()= 10713 <0.41>
 12:51:56.943944 getpid()= 10713 <0.41>
 12:51:56.944061 getpid()= 10713 <0.001334>
 12:51:56.945459 write(1, "Jul 07 09:25:23 x61s unknown"..., 56) = 56 <0.18>
 12:51:56.945544 getpid()= 10713 <0.17>
 12:51:56.945620 getpid()= 10713 <0.17>
 12:51:56.945694 getpid()= 10713 <0.12>
 12:51:56.945763 getpid()= 10713 <0.11>
 12:51:56.945832 getpid()= 10713 <0.12>
 12:51:56.945901 getpid()= 10713 <0.11>
 12:51:56.945969 getpid()= 10713 <0.11>
 12:51:56.946048 getpid()= 10713 <0.13>
 12:51:56.946118 getpid()= 10713 <0.24>
 12:51:56.946188 getpid()= 10713 <0.47>
 12:51:56.946277 getpid()= 10713 <0.41>
 12:51:56.946353 getpid()= 10713 <0.41>
 12:51:56.946428 getpid()= 10713 <0.40>
 12:51:56.946539 getpid()= 10713 <0.001363>

As this is public sd-journal API, it's somewhat set in stone.  However,
there's nothing preventing the systemd-internal tooling from linking with
a less defensive/conformant underlying implementation shared with the public
API implementation where these kinds of overheads can be preserved.

For the curious; the logs being processed for this boot are 48 * 8MiB on SSD,
1.8Ghz Core2 Duo, 4.12 kernel.

Regards,
Vito Caputo
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel