Re: Including remote addresses in smtpd syslog output

2020-05-26 Thread Kevin Zheng
Hi Gilles,

On 5/26/20 12:04 AM, gil...@poolp.org wrote:
> We now provide a reporting API which is basically a stream of events that can
> be consumed by tools. It is a line-based format which is not meant to be read
> by humans but meant to be easily parsed by tools and that provides all of the
> information necessary to replicate the session states. Using this stream, one
> can write a tiny filter which aggregates info and outputs logs tailored for a
> specific third-party application with a guarantee that it won't break when we
> make a subtle change to the maillog format. If I were working on SSHGuard for
> example, I'd write an sshguard-exporter script that reads the stream and that
> outputs to syslog a format SSHguard recognizes. This way, an smtpd user would
> simply:
> 
> filter sshguard proc-exec "sshguard-exporter"
> listen on all filter sshguard
> action "foobar" relay filter sshguard
> 
> SSHguard itself would never need to be altered to follow changes in logs.

Thanks makes sense to me. I was vaguely aware that actions and filters
became available, but I didn't know that they could do this. I think
this is exactly what I was looking for.

Thanks,
Kevin

-- 
Kevin Zheng
kevinz5...@gmail.com | kev...@berkeley.edu
XMPP: kev...@eecs.berkeley.edu



Re: Including remote addresses in smtpd syslog output

2020-05-26 Thread gilles
May 26, 2020 4:36 AM, "Kevin Zheng"  wrote:

> Hi folks,

Hellow,

> [...]
> 
> Chasing changes in syslog output is a part of maintaining software like
> SSHGuard. Unfortunately, my parser (which recently learned how to
> pledge!) is a bit dull and would require some re-education to remember
> SMTP sessions and their associated IP addresses. So, my questions are:
> 
> Why did OpenSMTPD stop reporting IP addresses on every line?
> 
> Is there any chance that OpenSMTPD can put IP addresses back on every line?

Sorry, I'll write a detailed answer for future reference:

A long time ago, people started relying on logs to write all kinds of scripts
and processors, which led to requests to add some info here and another there
until the logs became so dense that they scared users. People writing scripts
were happy, humans reading logs were overwhelmed with redundant informations.

Around this time, Theo asked me to make logs more readable, ideally so they'd
fit terminal lines for most logs. I did some compaction (i.e.: source -> src)
but this didn't help much because, as you mention, scripts used parsers which
didn't keep state and logs grew dense to keep state on their behalf. What did
not help further is that slight changes in format broke all existing scripts.


So it became obvious we were doing something wrong:

We WANT to make it easy for tools to extend OpenSMTPD, but this can't come at
the cost of freezing our maillog format or at the cost of users having harder
time going through maillog. The consumers, humans and scripts, have different
expectations so they need to consume different logs.


We now provide a reporting API which is basically a stream of events that can
be consumed by tools. It is a line-based format which is not meant to be read
by humans but meant to be easily parsed by tools and that provides all of the
information necessary to replicate the session states. Using this stream, one
can write a tiny filter which aggregates info and outputs logs tailored for a
specific third-party application with a guarantee that it won't break when we
make a subtle change to the maillog format. If I were working on SSHGuard for
example, I'd write an sshguard-exporter script that reads the stream and that
outputs to syslog a format SSHguard recognizes. This way, an smtpd user would
simply:

filter sshguard proc-exec "sshguard-exporter"
listen on all filter sshguard
action "foobar" relay filter sshguard

SSHguard itself would never need to be altered to follow changes in logs.


We decided to turn back maillog format into a human-oriented format, one that
guarantees you will find your info with two grep, but that does not guarantee
it won't have subtle changes at each release if they make logs more readable.
The maillog format logs events and their associated data, removing redundancy
and not keeping context across logs.

This allows me, as a human to easily grep for info and not be overflowed with
a lot of unrelated log lines that happen to carry the same information:

$ grep 'smtp connected' /var/log/maillog|tail -3
May 26 06:22:33 in smtpd[75500]: 0192b8a91f05254a smtp connected 
address=199.185.178.25
host=mail.openbsd.org
May 26 06:24:49 in smtpd[75500]: 0192b8ac864dd3e1 smtp connected 
address=199.185.178.25
host=mail.openbsd.org
May 26 06:24:56 in smtpd[75500]: 0192b8b1f1c2f60c smtp connected 
address=199.185.178.25
host=mail.openbsd.org

$ grep 'address=199.185.178.25' /var/log/maillog|tail -3
May 26 06:13:38 in smtpd[75500]: 0192b895dd7ab75f smtp connected 
address=199.185.178.25
host=mail.openbsd.org
May 26 06:22:12 in smtpd[75500]: 0192b8a43a226ada smtp connected 
address=199.185.178.25
host=mail.openbsd.org
May 26 06:22:33 in smtpd[75500]: 0192b8a91f05254a smtp connected 
address=199.185.178.25
host=mail.openbsd.org

And this also allows me to get a very compact format for an entire session:

$ grep '0192b8b1f1c2f60c' /var/log/maillog
May 26 06:24:56 in smtpd[75500]: 0192b8b1f1c2f60c smtp connected 
address=199.185.178.25
host=mail.openbsd.org
May 26 06:24:57 in smtpd[75500]: 0192b8b1f1c2f60c smtp tls
ciphers=TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256
May 26 06:25:04 in smtpd[75500]: 0192b8b1f1c2f60c smtp message msgid=4baeb9ac 
size=3110 nrcpt=1
proto=ESMTP
May 26 06:25:04 in smtpd[75500]: 0192b8b1f1c2f60c smtp envelope 
evpid=4baeb9ac8c6799b0
from= to=
May 26 06:25:14 in smtpd[75500]: 0192b8b1f1c2f60c smtp disconnected reason=quit


With this split, human log vs reporting api for tools, tomorrow we can decide
in the sample above to rename "message" into "msg" knowing that human readers
will not be affected and that scripts will not break as long as they use that
reporting API.

Hope it clarifies a bit,
Gilles