Re: [systemd-devel] [systemd‑devel] Antw: [EXT] Re: Q: non‑ASCII in syslog

2022-04-28 Thread Mantas Mikulėnas
On Thu, Apr 28, 2022 at 1:26 PM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> >>> Lennart Poettering  schrieb am 28.04.2022 um
> 10:27
> in
> Nachricht :
> > On Do, 28.04.22 09:32, Ulrich Windl (ulrich.wi...@rz.uni‑regensburg.de)
> wrote:
> >
> >> Actually I wasn't quite sure about the default config in SLES12.
> >> It seems the flow is journald ‑> local rsyslogd ‑> remote syslogd
> >>
> >> > rsyslogd already knows if messages are UTF‑8 because the system's
> $LANG
> >> > (well, nl_langinfo) says so. And if rsyslog can't trust that for some
> >> > reason (e.g. because a user might have a different locale), then
> >> > systemd‑journald won't be able to trust it either, so it won't know
> whether
> >> > it could add the BOM.
> >>
> >> How could a remote syslog server know what the locale on the sending
> system
> >> is?
> >
> > Your local rsyslogd could add the BOM when it transforms journal
> > messages to syslog datagrams.
> >
> >> > RFC 3164 over the network to a remote server? Outside the scope for
> >> > systemd, since it doesn't generate the network packets; your local
> rsyslogd
> >> > forwarder does. (Also, why RFC 3164 and not 5425?)
> >>
> >> If you look outside the world of systemd, about 99% of systems create
> the
> > RFC
> >> 3164 type of messages.
> >
> > That's a wild claim, and simply wrong actually.
>
> Well actually as systemd cannot send syslog messages to remote, which
> systems
> do you know that send RFC 5424 messages?
> Actually I know none here.
>

syslog-ng does with destination{syslog()}, rsyslogd does with
RSYSLOG_SyslogProtocol23Format; the HP switches at $WORK (and I think the
Cisco ones) didn't even have BSD-format as an option, always producing
5424-format.


> >
> > systemd is focussed on reality: we generate and process the same
> > format glibc generates.
>
> I'm wondering which API all those programs use that create correct syslog
> entries.
>

It's not that they create correct syslog entries, it's that the syslogd
(well, the /dev/log listener, so including journald) *parses and rebuilds*
the entries that come from the API before storing them anywhere.

Whether you use rsyslog or syslog-ng, they don't just dump program-provided
data to /var/log – they both parse the input into date + hostname + pid +
message, then reformat according to whatever output format is specified.
(For example, we have syslog-ng configured to write RFC3339 timestamps.)
Journald also does the same by design.

-- 
Mantas Mikulėnas


[systemd-devel] Antw: Re: [systemd‑devel] Antw: [EXT] Re: Q: non‑ASCII in syslog

2022-04-28 Thread Ulrich Windl
>>> Lennart Poettering  schrieb am 28.04.2022 um 10:27
in
Nachricht :
> On Do, 28.04.22 09:32, Ulrich Windl (ulrich.wi...@rz.uni‑regensburg.de)
wrote:
> 
>> Actually I wasn't quite sure about the default config in SLES12.
>> It seems the flow is journald ‑> local rsyslogd ‑> remote syslogd
>>
>> > rsyslogd already knows if messages are UTF‑8 because the system's $LANG
>> > (well, nl_langinfo) says so. And if rsyslog can't trust that for some
>> > reason (e.g. because a user might have a different locale), then
>> > systemd‑journald won't be able to trust it either, so it won't know
whether
>> > it could add the BOM.
>>
>> How could a remote syslog server know what the locale on the sending
system
>> is?
> 
> Your local rsyslogd could add the BOM when it transforms journal
> messages to syslog datagrams.
> 
>> > RFC 3164 over the network to a remote server? Outside the scope for
>> > systemd, since it doesn't generate the network packets; your local
rsyslogd
>> > forwarder does. (Also, why RFC 3164 and not 5425?)
>>
>> If you look outside the world of systemd, about 99% of systems create the 
> RFC
>> 3164 type of messages.
> 
> That's a wild claim, and simply wrong actually.

Well actually as systemd cannot send syslog messages to remote, which systems
do you know that send RFC 5424 messages?
Actually I know none here.

> 
> I am pretty sure that more than 50% of syslog messages generated on
> this earth probably are synthesized by glibc's syslog() API. And that
> turns out to be neither conformant to RFC 3164 nor to RFC 5425.

No idea. Can you give an example?

> 
> What glibc sends is close to RFC 3164 but omits one key field that
> isn't really optionally according to RFC 3164: the 'HOSTNAME' field.

Maybe the API is not used correctly. The RFC 3164 says:
"A relay will add a TIMESTAMP and SHOULD add a HOSTNAME as follows (...)"
So when sending to any remote syslog a HOSTNAME should be there.
(It's like a MTA adding a Message-ID (and other fields) if none is present)

Most notable the RFC seems to allow a missing hostname initially.

> 
> systemd is focussed on reality: we generate and process the same
> format glibc generates.

I'm wondering which API all those programs use that create correct syslog
entries.
I tried with my own program:
It sends:
connect(1, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 0
sendto(1, "<31>Apr 28 11:08:32 iotwatch[239"..., 56, MSG_NOSIGNAL, NULL, 0) =
56

What's logged is:
Apr 28 11:08:32 host-name iotwatch[239...

Also from the syntax being sent by the application, one cannot really say
whether the hostname is missing.
Maybe the trick is that /dev/log is specified as source for _local_ syslog
messages (so that there's no reason or sense in supplying the local hostname).
Also I'm not sure whether the messages in /dev/log are covered by the RFC.

Regards,
Ulrich Windl

> 
> Lennart
> 
> ‑‑
> Lennart Poettering, Berlin





Re: [systemd-devel] Antw: [EXT] Re: Q: non-ASCII in syslog

2022-04-28 Thread Lennart Poettering
On Do, 28.04.22 09:32, Ulrich Windl (ulrich.wi...@rz.uni-regensburg.de) wrote:

> Actually I wasn't quite sure about the default config in SLES12.
> It seems the flow is journald -> local rsyslogd -> remote syslogd
>
> > rsyslogd already knows if messages are UTF-8 because the system's $LANG
> > (well, nl_langinfo) says so. And if rsyslog can't trust that for some
> > reason (e.g. because a user might have a different locale), then
> > systemd-journald won't be able to trust it either, so it won't know whether
> > it could add the BOM.
>
> How could a remote syslog server know what the locale on the sending system
> is?

Your local rsyslogd could add the BOM when it transforms journal
messages to syslog datagrams.

> > RFC 3164 over the network to a remote server? Outside the scope for
> > systemd, since it doesn't generate the network packets; your local rsyslogd
> > forwarder does. (Also, why RFC 3164 and not 5425?)
>
> If you look outside the world of systemd, about 99% of systems create the RFC
> 3164 type of messages.

That's a wild claim, and simply wrong actually.

I am pretty sure that more than 50% of syslog messages generated on
this earth probably are synthesized by glibc's syslog() API. And that
turns out to be neither conformant to RFC 3164 nor to RFC 5425.

What glibc sends is close to RFC 3164 but omits one key field that
isn't really optionally according to RFC 3164: the 'HOSTNAME' field.

systemd is focussed on reality: we generate and process the same
format glibc generates.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Antw: [EXT] Re: Q: non-ASCII in syslog

2022-04-28 Thread Lennart Poettering
On Do, 28.04.22 09:37, Ulrich Windl (ulrich.wi...@rz.uni-regensburg.de) wrote:

> > There's plenty software that doesn't support RFC 5425, and putting a
> > BOM first is certainly not implemented in any of those. I think BOM is
> > hideous and defaulting to UTF-8 generally safe. If we'd put BOM first,
> > these messages would likely not be compatible with a large variety of
> > consumers anymore, because they can't handle BOM. This would be worse
>
> That's a non-argument:
> You say you don't adhere to any of the standards, and claim if you would do,
> things would break. ???

Yes, RFC 5425 is implemented by some logging infra, but it isn't by
all, and messages that are valid by rfc 5425 are not necessarily
compatible with messages generated/processed by software not knowing
rfc 5425. Adding the BOM is a sure way to guarantee software that
doesn't implement rfc 5425 won't be able to process your messages
anymore.

systemd's support for syslog (both on the generating and the consuming
side) is modelled after glibc's logging implementation, nothing
else. It also doesn't do BOM, hence we don't either.

> > than the status quo I am sure, since if we just send UTF-8 things
> > should generally just work fine for any software that either a) also
> > defaults to UTF-8 when encountering an 8bit char or b) is agonistic to
> > charsets and just passes data thorugh.
>
> Yes, put the head in the sand hoping problems are gone when you look up
> again... ;-)

I am pretty sure by inserting the BOM you create more
incompatibilities than you solve.

> > So, yeah, we might be stretching stdandards and tradition a bit, but
> > it actually works out quite well so far.
>
> A good argument for driving without a saftey-belt, BTW.

This comparison makes no sense. Please be civil.

Lennart

--
Lennart Poettering, Berlin


[systemd-devel] Antw: [EXT] Re: Q: non-ASCII in syslog

2022-04-28 Thread Ulrich Windl
>>> Lennart Poettering  schrieb am 27.04.2022 um 13:10
in
Nachricht :
> On Mi, 27.04.22 09:09, Ulrich Windl (ulrich.wi...@rz.uni-regensburg.de) 
> wrote:
> 
>> Hi!
>>
>> Having written an RFC 3164 compatible syslog daemon, I noticed that
systemd
>> created syslog messages with non-ASCII characters.
>> The problem is that a remote syslogd can hardly guess the correct
character
>> set (I'm using rsyslog to forward local messages to a remote
>> server).
> 
> It's 2022. I think at this point, software should always assume the
> charset is UTF-8 if it doesn't have an reason to believe otherwise.
> 
> It's kinda what we started to do all across our codebase really. We'll
> use UTF-8 for everything by default. For some things where people
> complain sufficeintly loudly we'll conditionalize them so that we have
> some fallback in place if we know for sure UTF-8 is not OK, but the
> default we do is always and everywhere UTF-8.
> 
>> Example of such message:
>> systemd-tmpfiles[3311]: [/usr/lib/tmpfiles.d/svnserve.conf:1] Line 
> references
>> path below legacy directory /var/run/, updating /var/run/svnserve →
>> /run/svnserve; please update the tmpfiles.d/ drop-in file accordingly.
>>
>> (The arrow is encoded as three bytes (\xe2\x86\x92))
>>
>> RFC 5425 syslog messages require the use of a BOM (%xEF.BB.BF) at the
>> beginning of a message if the message used UTF-8:
> 
> We do not implement RFC 5425, as glibc doesn't support that. In fact
> we don't even implement RFC 3164 in full, since glibc generates the
> messages in a very specific format only.
> 
>>
>>   MSG = MSG-ANY / MSG-UTF8
>>   MSG-ANY = *OCTET ; not starting with BOM
>>   MSG-UTF8= BOM UTF-8-STRING
>>   BOM = %xEF.BB.BF
>>
>> Wouldn't it make sense to add such a BOM for RFC 3164 syslog messages also

> if
>> non-ASCII (i.e.: UTF-8) encoded characters are used?
> 
> There's plenty software that doesn't support RFC 5425, and putting a
> BOM first is certainly not implemented in any of those. I think BOM is
> hideous and defaulting to UTF-8 generally safe. If we'd put BOM first,
> these messages would likely not be compatible with a large variety of
> consumers anymore, because they can't handle BOM. This would be worse

That's a non-argument:
You say you don't adhere to any of the standards, and claim if you would do,
things would break. ???

> than the status quo I am sure, since if we just send UTF-8 things
> should generally just work fine for any software that either a) also
> defaults to UTF-8 when encountering an 8bit char or b) is agonistic to
> charsets and just passes data thorugh.

Yes, put the head in the sand hoping problems are gone when you look up
again... ;-)

> 
> So, yeah, we might be stretching stdandards and tradition a bit, but
> it actually works out quite well so far.

A good argument for driving without a saftey-belt, BTW.

Regards,
Ulrich

> 
> Lennart
> 
> --
> Lennart Poettering, Berlin





[systemd-devel] Antw: [EXT] Re: Q: non-ASCII in syslog

2022-04-28 Thread Ulrich Windl
>>> Mantas Mikulenas  schrieb am 27.04.2022 um 12:03 in
Nachricht
:
> On Wed, Apr 27, 2022 at 10:09 AM Ulrich Windl <
> ulrich.wi...@rz.uni-regensburg.de> wrote:
> 
>> Hi!
>>
>> Having written an RFC 3164 compatible syslog daemon, I noticed that
systemd
>> created syslog messages with non-ASCII characters.
>> The problem is that a remote syslogd can hardly guess the correct
character
>> set (I'm using rsyslog to forward local messages to a remote server).
>>
>> Example of such message:
>> systemd-tmpfiles[3311]: [/usr/lib/tmpfiles.d/svnserve.conf:1] Line
>> references
>> path below legacy directory /var/run/, updating /var/run/svnserve →
>> /run/svnserve; please update the tmpfiles.d/ drop-in file accordingly.
>>
>> (The arrow is encoded as three bytes (\xe2\x86\x92))
>>
>> RFC 5425 syslog messages require the use of a BOM (%xEF.BB.BF) at the
>> beginning of a message if the message used UTF-8:
>>
>>   MSG = MSG-ANY / MSG-UTF8
>>   MSG-ANY = *OCTET ; not starting with BOM
>>   MSG-UTF8= BOM UTF-8-STRING
>>   BOM = %xEF.BB.BF
>>
>> Wouldn't it make sense to add such a BOM for RFC 3164 syslog messages also
>> if
>> non-ASCII (i.e.: UTF-8) encoded characters are used?
>>
> 
> RFC 3164 over a local socket from journald to local rsyslogd? The local

Actually I wasn't quite sure about the default config in SLES12.
It seems the flow is journald -> local rsyslogd -> remote syslogd

> rsyslogd already knows if messages are UTF-8 because the system's $LANG
> (well, nl_langinfo) says so. And if rsyslog can't trust that for some
> reason (e.g. because a user might have a different locale), then
> systemd-journald won't be able to trust it either, so it won't know whether
> it could add the BOM.

How could a remote syslog server know what the locale on the sending system
is?

> 
> RFC 3164 over the network to a remote server? Outside the scope for
> systemd, since it doesn't generate the network packets; your local rsyslogd
> forwarder does. (Also, why RFC 3164 and not 5425?)

If you look outside the world of systemd, about 99% of systems create the RFC
3164 type of messages.
Some may send non-ASCII too, however.

> 
> Generally, if a message successfully decodes as UTF-8 then it's most likely
> actual UTF-8 (and if UTF-8 decode fails then you fall back to ISO8859-1).
> Various old systems get away with this without needing a UTF-8 BOM.

Yes, you can just output what you received, hoping the messages will be
presented correctly.
I't just like sending 8-bit E-Mmail without a coding system or charset in the
past.

Regards,
Ulrich