I had a little debugging session with awolk@ over at #openbsd-daily. His
smtpd would over time end up with hung sessions that never timeout.
The problem is related to the data_io path's congestion control which
may pause the session. In this case the io system will not wait for
read events and as such will not have a chance to timeout until it is
resumed.
If the pause happens when a full message is just about to pass through
the data_io path, the session is never resumed, even though there is
obviously no more congestion and the session should be reading more
input from the client again.
A debug trace excerpt shows the course of events:
mtp: 0xe54baa1e000: IO_DATAIN <io:0xe551d0d5000 fd=25 to=300000 fl=R ib=16839
ob=0>
debug: smtp: 0xe54baa1e000: filter congestion: pausing session
smtp: 0xe54baa1e000 (data): IO_LOWAT <io:0xe55ac8e8c00 fd=32 to=-1 fl=W ib=0
ob=0>
debug: smtp: 0xe54baa1e000: data io done (259146 bytes)
debug: 0xe54baa1e000: end of message, msgflags=0x0000
smtp: 0xe54baa1e000: >>> 250 2.0.0: 4f36f9e3 Message accepted for delivery
smtp: 0xe54baa1e000: STATE_BODY -> STATE_HELO
smtp: 0xe54baa1e000: IO_LOWAT <io:0xe551d0d5000 fd=25 to=300000 fl=W,F_PI ib=0
ob=0>
>From this point on, session 0xe54baa1e000 and its io 0xe551d0d5000
(which has the pause_in flag) are never seen again in the trace, and
fstat shows a corresponding connection to smtpd that never goes away.
The proposed fix is to always resume the session if the data_io path
hits the low water mark.
Mr. Wolk tested this diff against smtpd on 6.1 as well as a against
-current version of smtpd (compiled on the same system running 6.1).
Index: usr.sbin/smtpd/smtp_session.c
===================================================================
RCS file: /cvs/src/usr.sbin/smtpd/smtp_session.c,v
retrieving revision 1.303
diff -u -p -r1.303 smtp_session.c
--- usr.sbin/smtpd/smtp_session.c 17 May 2017 14:00:06 -0000 1.303
+++ usr.sbin/smtpd/smtp_session.c 15 Jun 2017 20:28:12 -0000
@@ -1474,9 +1474,10 @@ smtp_data_io(struct io *io, int evt, voi
break;
case IO_LOWAT:
- if (s->tx->dataeom && io_queued(s->tx->oev) == 0) {
+ if (s->tx->dataeom && io_queued(s->tx->oev) == 0)
smtp_data_io_done(s);
- } else if (io_paused(s->io, IO_IN)) {
+
+ if (io_paused(s->io, IO_IN)) {
log_debug("debug: smtp: %p: filter congestion over:
resuming session", s);
io_resume(s->io, IO_IN);
}