Re: [systemd-devel] systemd-journald may crash during memory pressure

2018-02-09 Thread Uoti Urpala
On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote:
> This last log lines indicates journald wasn't scheduled for a long
> time which caused the watchdog to hit and journald was
> aborted. Consider increasing the watchdog timeout if your system is
> indeed that loaded and that's is supposed to be an OK thing...

BTW I've seen the same behavior on a system with a single active
process that uses enough memory to trigger significant swap use. I
wonder if there has been a regression in the kernel causing misbehavior
when swapping? The problems aren't specific to journald - desktop
environment can totally freeze too etc.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-journald may crash during memory pressure

2018-02-09 Thread Lennart Poettering
On Do, 08.02.18 23:50, Kai Krakow (hurikha...@gmail.com) wrote:

> Hello!
> 
> During memory pressure and/or high load, journald may crash. This is
> probably due to design using mmap but it should really not do this.
> 
> On 32-bit systems, we are seeing such crashes constantly although the
> available memory is still gigabytes (it's a 32-bit userland running in a
> 64-bit kernel).
> 
> 
> [82988.670323] systemd[1]: systemd-journald.service: Main process exited, 
> code=dumped, status=6/ABRT
> [82988.670684] systemd[1]: systemd-journald.service: Failed with result 
> 'watchdog'.

This last log lines indicates journald wasn't scheduled for a long
time which caused the watchdog to hit and journald was
aborted. Consider increasing the watchdog timeout if your system is
indeed that loaded and that's is supposed to be an OK thing...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-journald may crash during memory pressure

2018-02-09 Thread Kai Krakow
Am Thu, 08 Feb 2018 18:12:23 -0800 schrieb vcaputo:

> Note the logs you've pasted portray a watchdog timeout which resulted in
> SIGABRT and a subsequent core dump.
> 
> This is not really a journald "crash", and you can increase the watchdog
> timeout or disable it entirely to make it more tolerant of thrashing.
> 
> What I presume happened is the system was thrashing and a page fault in
> the mapped journal took too long to complete.

Oh thanks, this is a good pointer. I'll try that.


> On Thu, Feb 08, 2018 at 11:50:45PM +0100, Kai Krakow wrote:
>> Hello!
>> 
>> During memory pressure and/or high load, journald may crash. This is
>> probably due to design using mmap but it should really not do this.
>> 
>> On 32-bit systems, we are seeing such crashes constantly although the
>> available memory is still gigabytes (it's a 32-bit userland running in a
>> 64-bit kernel).
>> 
>> 
>> [82988.670323] systemd[1]: systemd-journald.service: Main process exited, 
>> code=dumped, status=6/ABRT
>> [82988.670684] systemd[1]: systemd-journald.service: Failed with result 
>> 'watchdog'.
>> [82988.685928] systemd[1]: systemd-journald.service: Service has no hold-off 
>> time, scheduling restart.
>> [82988.709575] systemd[1]: systemd-journald.service: Scheduled restart job, 
>> restart counter is at 2.
>> [82988.717390] systemd[1]: Stopped Flush Journal to Persistent Storage.
>> [82988.717411] systemd[1]: Stopping Flush Journal to Persistent Storage...
>> [82988.726303] systemd[1]: Stopped Journal Service.
>> [82988.844462] systemd[1]: Starting Journal Service...
>> [82993.633781] systemd-coredump[22420]: MESSAGE=Process 461 
>> (systemd-journal) of user 0 dumped core.
>> [82993.633811] systemd-coredump[22420]: Coredump diverted to 
>> /var/lib/systemd/coredump/core.systemd-journal.0.3d492c866f254fb981f916c6c3918046.461.151812537700.lz4
>> [82993.633813] systemd-coredump[22420]: Stack trace of thread 461:
>> [82993.633814] systemd-coredump[22420]: #0  0x7f940241d4dd 
>> journal_file_move_to_object (libsystemd-shared-237.so)
>> [82993.633815] systemd-coredump[22420]: #1  0x7f940241e910 
>> journal_file_find_data_object_with_hash (libsystemd-shared-237.so)
>> [82993.633816] systemd-coredump[22420]: #2  0x7f940241fe81 
>> journal_file_append_data (libsystemd-shared-237.so)
>> [82993.633817] systemd-coredump[22420]: #3  0x556a343ae9ea 
>> write_to_journal (systemd-journald)
>> [82993.633819] systemd-coredump[22420]: #4  0x556a343b0974 
>> server_dispatch_message (systemd-journald)
>> [82993.633820] systemd-coredump[22420]: #5  0x556a343b24bb 
>> stdout_stream_log (systemd-journald)
>> [82993.633821] systemd-coredump[22420]: #6  0x556a343b2afe 
>> stdout_stream_line (systemd-journald)
>> [82993.723157] systemd-coredum: 7 output lines suppressed due to ratelimiting
>> [83002.830610] systemd-journald[22424]: File 
>> /var/log/journal/121b87ca633e8ac001665668001b/system.journal corrupted 
>> or uncleanly shut down, renaming and replacing.
>> [83014.774538] systemd[1]: Started Journal Service.
>> [83119.277143] systemd-journald[22424]: File 
>> /var/log/journal/121b87ca633e8ac001665668001b/user-500.journal corrupted 
>> or uncleanly shut down, renaming and replacing.
>> 
>> 
>> -- 
>> Regards,
>> Kai
>> 
>> Replies to list-only preferred.
>> 
>> ___
>> systemd-devel mailing list
>> systemd-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel





-- 
Regards,
Kai

Replies to list-only preferred.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel