On Wed, Apr 27, 2022 at 10:09 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote:
> Hi! > > Having written an RFC 3164 compatible syslog daemon, I noticed that systemd > created syslog messages with non-ASCII characters. > The problem is that a remote syslogd can hardly guess the correct character > set (I'm using rsyslog to forward local messages to a remote server). > > Example of such message: > systemd-tmpfiles[3311]: [/usr/lib/tmpfiles.d/svnserve.conf:1] Line > references > path below legacy directory /var/run/, updating /var/run/svnserve → > /run/svnserve; please update the tmpfiles.d/ drop-in file accordingly. > > (The arrow is encoded as three bytes (\xe2\x86\x92)) > > RFC 5425 syslog messages require the use of a BOM (%xEF.BB.BF) at the > beginning of a message if the message used UTF-8: > > MSG = MSG-ANY / MSG-UTF8 > MSG-ANY = *OCTET ; not starting with BOM > MSG-UTF8 = BOM UTF-8-STRING > BOM = %xEF.BB.BF > > Wouldn't it make sense to add such a BOM for RFC 3164 syslog messages also > if > non-ASCII (i.e.: UTF-8) encoded characters are used? > RFC 3164 over a local socket from journald to local rsyslogd? The local rsyslogd already knows if messages are UTF-8 because the system's $LANG (well, nl_langinfo) says so. And if rsyslog can't trust that for some reason (e.g. because a user might have a different locale), then systemd-journald won't be able to trust it either, so it won't know whether it could add the BOM. RFC 3164 over the network to a remote server? Outside the scope for systemd, since it doesn't generate the network packets; your local rsyslogd forwarder does. (Also, why RFC 3164 and not 5425?) Generally, if a message successfully decodes as UTF-8 then it's most likely actual UTF-8 (and if UTF-8 decode fails then you fall back to ISO8859-1). Various old systems get away with this without needing a UTF-8 BOM. -- Mantas Mikulėnas