On Fri, Jun 16, 2017 at 01:05:25AM +0300, Henri Kemppainen wrote:
> I had a little debugging session with awolk@ over at #openbsd-daily. His
> smtpd would over time end up with hung sessions that never timeout.
>
> The problem is related to the data_io path's congestion control which
> may pause the session. In this case the io system will not wait for
> read events and as such will not have a chance to timeout until it is
> resumed.
>
> If the pause happens when a full message is just about to pass through
> the data_io path, the session is never resumed, even though there is
> obviously no more congestion and the session should be reading more
> input from the client again.
>
> A debug trace excerpt shows the course of events:
>
> mtp: 0xe54baa1e000: IO_DATAIN <io:0xe551d0d5000 fd=25 to=300000 fl=R ib=16839
> ob=0>
> debug: smtp: 0xe54baa1e000: filter congestion: pausing session
> smtp: 0xe54baa1e000 (data): IO_LOWAT <io:0xe55ac8e8c00 fd=32 to=-1 fl=W ib=0
> ob=0>
> debug: smtp: 0xe54baa1e000: data io done (259146 bytes)
> debug: 0xe54baa1e000: end of message, msgflags=0x0000
> smtp: 0xe54baa1e000: >>> 250 2.0.0: 4f36f9e3 Message accepted for delivery
> smtp: 0xe54baa1e000: STATE_BODY -> STATE_HELO
> smtp: 0xe54baa1e000: IO_LOWAT <io:0xe551d0d5000 fd=25 to=300000 fl=W,F_PI
> ib=0 ob=0>
>
> From this point on, session 0xe54baa1e000 and its io 0xe551d0d5000
> (which has the pause_in flag) are never seen again in the trace, and
> fstat shows a corresponding connection to smtpd that never goes away.
>
> The proposed fix is to always resume the session if the data_io path
> hits the low water mark.
>
> Mr. Wolk tested this diff against smtpd on 6.1 as well as a against
> -current version of smtpd (compiled on the same system running 6.1).
>
Nice catch, the diff reads fine to me, I'll commit later today when I
have another ok from eric@
side note, smtpd should not have been able to leak more than 5 fd, it
should not have been able to exhaust, is this what you observed ?
> Index: usr.sbin/smtpd/smtp_session.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/smtpd/smtp_session.c,v
> retrieving revision 1.303
> diff -u -p -r1.303 smtp_session.c
> --- usr.sbin/smtpd/smtp_session.c 17 May 2017 14:00:06 -0000 1.303
> +++ usr.sbin/smtpd/smtp_session.c 15 Jun 2017 20:28:12 -0000
> @@ -1474,9 +1474,10 @@ smtp_data_io(struct io *io, int evt, voi
> break;
>
> case IO_LOWAT:
> - if (s->tx->dataeom && io_queued(s->tx->oev) == 0) {
> + if (s->tx->dataeom && io_queued(s->tx->oev) == 0)
> smtp_data_io_done(s);
> - } else if (io_paused(s->io, IO_IN)) {
> +
> + if (io_paused(s->io, IO_IN)) {
> log_debug("debug: smtp: %p: filter congestion over:
> resuming session", s);
> io_resume(s->io, IO_IN);
> }
>
>
--
Gilles Chehade
https://www.poolp.org @poolpOrg