On Fri, Jun 16, 2017 at 01:05:25AM +0300, Henri Kemppainen wrote:
> I had a little debugging session with awolk@ over at #openbsd-daily.  His
> smtpd would over time end up with hung sessions that never timeout.
> 
> The problem is related to the data_io path's congestion control which
> may pause the session.  In this case the io system will not wait for
> read events and as such will not have a chance to timeout until it is
> resumed.
> 
> If the pause happens when a full message is just about to pass through
> the data_io path, the session is never resumed, even though there is
> obviously no more congestion and the session should be reading more
> input from the client again.
> 
> A debug trace excerpt shows the course of events:
> 
> mtp: 0xe54baa1e000: IO_DATAIN <io:0xe551d0d5000 fd=25 to=300000 fl=R ib=16839 
> ob=0>
> debug: smtp: 0xe54baa1e000: filter congestion: pausing session
> smtp: 0xe54baa1e000 (data): IO_LOWAT <io:0xe55ac8e8c00 fd=32 to=-1 fl=W ib=0 
> ob=0>
> debug: smtp: 0xe54baa1e000: data io done (259146 bytes)
> debug: 0xe54baa1e000: end of message, msgflags=0x0000
> smtp: 0xe54baa1e000: >>> 250 2.0.0: 4f36f9e3 Message accepted for delivery
> smtp: 0xe54baa1e000: STATE_BODY -> STATE_HELO
> smtp: 0xe54baa1e000: IO_LOWAT <io:0xe551d0d5000 fd=25 to=300000 fl=W,F_PI 
> ib=0 ob=0>
> 
> From this point on, session 0xe54baa1e000 and its io 0xe551d0d5000
> (which has the pause_in flag) are never seen again in the trace, and
> fstat shows a corresponding connection to smtpd that never goes away.
> 
> The proposed fix is to always resume the session if the data_io path
> hits the low water mark.
> 
> Mr. Wolk tested this diff against smtpd on 6.1 as well as a against
> -current version of smtpd (compiled on the same system running 6.1).
> 

Nice catch, the diff reads fine to me, I'll commit later today when I
have another ok from eric@

side note, smtpd should not have been able to leak more than 5 fd, it
should not have been able to exhaust, is this what you observed ?


> Index: usr.sbin/smtpd/smtp_session.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/smtpd/smtp_session.c,v
> retrieving revision 1.303
> diff -u -p -r1.303 smtp_session.c
> --- usr.sbin/smtpd/smtp_session.c     17 May 2017 14:00:06 -0000      1.303
> +++ usr.sbin/smtpd/smtp_session.c     15 Jun 2017 20:28:12 -0000
> @@ -1474,9 +1474,10 @@ smtp_data_io(struct io *io, int evt, voi
>               break;
>  
>       case IO_LOWAT:
> -             if (s->tx->dataeom && io_queued(s->tx->oev) == 0) {
> +             if (s->tx->dataeom && io_queued(s->tx->oev) == 0)
>                       smtp_data_io_done(s);
> -             } else if (io_paused(s->io, IO_IN)) {
> +
> +             if (io_paused(s->io, IO_IN)) {
>                       log_debug("debug: smtp: %p: filter congestion over: 
> resuming session", s);
>                       io_resume(s->io, IO_IN);
>               }
> 
> 

-- 
Gilles Chehade

https://www.poolp.org                                          @poolpOrg

Reply via email to