Re: smtpd session hang

2017-06-19 Thread Gilles Chehade
On Fri, Jun 16, 2017 at 07:12:43PM +0300, Henri Kemppainen wrote:
> > > Nice catch, the diff reads fine to me, I'll commit later today when I
> > > have another ok from eric@
> 
> > Yes, this looks correct. But, I would rather move the resume test before
> > the EOM test, to avoid touching the session after the transfer has been
> > finalized by smtp_data_io_done().
> 
> It oughtn't make a difference as long as the duplex control is correct,
> but I'm fine with that change.
> 
> > > side note, smtpd should not have been able to leak more than 5 fd, it
> > > should not have been able to exhaust, is this what you observed ?
> 
> For the record, we discussed this with Gilles on irc and while we saw
> more than a dozen leaked fds, it's okay as smtpd will allow as many
> incoming sessions as the dtable can accommodate (with an fd reserve).
> The lower limits are on outgoing connections.
> 
> New diff with reordered code.  I'll see if I can get Adam to run one more
> round of testing..
> 

Committed thanks :)
-- 
Gilles Chehade

https://www.poolp.org  @poolpOrg



Re: smtpd session hang

2017-06-16 Thread adam . wolk
On Fri, Jun 16, 2017 at 07:12:43PM +0300, Henri Kemppainen wrote:
> > > Nice catch, the diff reads fine to me, I'll commit later today when I
> > > have another ok from eric@
> 
> > Yes, this looks correct. But, I would rather move the resume test before
> > the EOM test, to avoid touching the session after the transfer has been
> > finalized by smtp_data_io_done().
> 
> It oughtn't make a difference as long as the duplex control is correct,
> but I'm fine with that change.
> 
> > > side note, smtpd should not have been able to leak more than 5 fd, it
> > > should not have been able to exhaust, is this what you observed ?
> 
> For the record, we discussed this with Gilles on irc and while we saw
> more than a dozen leaked fds, it's okay as smtpd will allow as many
> incoming sessions as the dtable can accommodate (with an fd reserve).
> The lower limits are on outgoing connections.
> 
> New diff with reordered code.  I'll see if I can get Adam to run one more
> round of testing..
> 

Tested the diff on -current OpenSMTPD running on a 6.1 server.
Sent 11 emails with ~2MB of base64 encoded garbage (like with our previous 
tests)
all emails were delivered with no FD leaks.

I am also willing to test an errata diff on 6.1 if this qualifies 
(reliability/performance).

> 
> Index: usr.sbin/smtpd/smtp_session.c
> ===
> RCS file: /cvs/src/usr.sbin/smtpd/smtp_session.c,v
> retrieving revision 1.303
> diff -u -p -r1.303 smtp_session.c
> --- usr.sbin/smtpd/smtp_session.c 17 May 2017 14:00:06 -  1.303
> +++ usr.sbin/smtpd/smtp_session.c 16 Jun 2017 14:56:11 -
> @@ -1474,12 +1474,12 @@ smtp_data_io(struct io *io, int evt, voi
>   break;
>  
>   case IO_LOWAT:
> - if (s->tx->dataeom && io_queued(s->tx->oev) == 0) {
> - smtp_data_io_done(s);
> - } else if (io_paused(s->io, IO_IN)) {
> + if (io_paused(s->io, IO_IN)) {
>   log_debug("debug: smtp: %p: filter congestion over: 
> resuming session", s);
>   io_resume(s->io, IO_IN);
>   }
> + if (s->tx->dataeom && io_queued(s->tx->oev) == 0)
> + smtp_data_io_done(s);
>   break;
>  
>   default:
> 



Re: smtpd session hang

2017-06-16 Thread Henri Kemppainen
> > Nice catch, the diff reads fine to me, I'll commit later today when I
> > have another ok from eric@

> Yes, this looks correct. But, I would rather move the resume test before
> the EOM test, to avoid touching the session after the transfer has been
> finalized by smtp_data_io_done().

It oughtn't make a difference as long as the duplex control is correct,
but I'm fine with that change.

> > side note, smtpd should not have been able to leak more than 5 fd, it
> > should not have been able to exhaust, is this what you observed ?

For the record, we discussed this with Gilles on irc and while we saw
more than a dozen leaked fds, it's okay as smtpd will allow as many
incoming sessions as the dtable can accommodate (with an fd reserve).
The lower limits are on outgoing connections.

New diff with reordered code.  I'll see if I can get Adam to run one more
round of testing..


Index: usr.sbin/smtpd/smtp_session.c
===
RCS file: /cvs/src/usr.sbin/smtpd/smtp_session.c,v
retrieving revision 1.303
diff -u -p -r1.303 smtp_session.c
--- usr.sbin/smtpd/smtp_session.c   17 May 2017 14:00:06 -  1.303
+++ usr.sbin/smtpd/smtp_session.c   16 Jun 2017 14:56:11 -
@@ -1474,12 +1474,12 @@ smtp_data_io(struct io *io, int evt, voi
break;
 
case IO_LOWAT:
-   if (s->tx->dataeom && io_queued(s->tx->oev) == 0) {
-   smtp_data_io_done(s);
-   } else if (io_paused(s->io, IO_IN)) {
+   if (io_paused(s->io, IO_IN)) {
log_debug("debug: smtp: %p: filter congestion over: 
resuming session", s);
io_resume(s->io, IO_IN);
}
+   if (s->tx->dataeom && io_queued(s->tx->oev) == 0)
+   smtp_data_io_done(s);
break;
 
default:



Re: smtpd session hang

2017-06-16 Thread Eric Faurot
On Fri, Jun 16, 2017 at 09:11:29AM +0200, Gilles Chehade wrote:
> On Fri, Jun 16, 2017 at 01:05:25AM +0300, Henri Kemppainen wrote:
> > I had a little debugging session with awolk@ over at #openbsd-daily.  His
> > smtpd would over time end up with hung sessions that never timeout.
> > 
> > The problem is related to the data_io path's congestion control which
> > may pause the session.  In this case the io system will not wait for
> > read events and as such will not have a chance to timeout until it is
> > resumed.
> > 
> > If the pause happens when a full message is just about to pass through
> > the data_io path, the session is never resumed, even though there is
> > obviously no more congestion and the session should be reading more
> > input from the client again.
> > 
> > A debug trace excerpt shows the course of events:
> > 
> > mtp: 0xe54baa1e000: IO_DATAIN  > ib=16839 ob=0>
> > debug: smtp: 0xe54baa1e000: filter congestion: pausing session
> > smtp: 0xe54baa1e000 (data): IO_LOWAT  > ib=0 ob=0>
> > debug: smtp: 0xe54baa1e000: data io done (259146 bytes)
> > debug: 0xe54baa1e000: end of message, msgflags=0x
> > smtp: 0xe54baa1e000: >>> 250 2.0.0: 4f36f9e3 Message accepted for delivery
> > smtp: 0xe54baa1e000: STATE_BODY -> STATE_HELO
> > smtp: 0xe54baa1e000: IO_LOWAT  > ib=0 ob=0>
> > 
> > From this point on, session 0xe54baa1e000 and its io 0xe551d0d5000
> > (which has the pause_in flag) are never seen again in the trace, and
> > fstat shows a corresponding connection to smtpd that never goes away.
> > 
> > The proposed fix is to always resume the session if the data_io path
> > hits the low water mark.
> > 
> > Mr. Wolk tested this diff against smtpd on 6.1 as well as a against
> > -current version of smtpd (compiled on the same system running 6.1).
> > 
> 
> Nice catch, the diff reads fine to me, I'll commit later today when I
> have another ok from eric@

Yes, this looks correct. But, I would rather move the resume test before
the EOM test, to avoid touching the session after the transfer has been
finalized by smtp_data_io_done().

Eric.

> side note, smtpd should not have been able to leak more than 5 fd, it
> should not have been able to exhaust, is this what you observed ?

> 
> > Index: usr.sbin/smtpd/smtp_session.c
> > ===
> > RCS file: /cvs/src/usr.sbin/smtpd/smtp_session.c,v
> > retrieving revision 1.303
> > diff -u -p -r1.303 smtp_session.c
> > --- usr.sbin/smtpd/smtp_session.c   17 May 2017 14:00:06 -  1.303
> > +++ usr.sbin/smtpd/smtp_session.c   15 Jun 2017 20:28:12 -
> > @@ -1474,9 +1474,10 @@ smtp_data_io(struct io *io, int evt, voi
> > break;
> >  
> > case IO_LOWAT:
> > -   if (s->tx->dataeom && io_queued(s->tx->oev) == 0) {
> > +   if (s->tx->dataeom && io_queued(s->tx->oev) == 0)
> > smtp_data_io_done(s);
> > -   } else if (io_paused(s->io, IO_IN)) {
> > +
> > +   if (io_paused(s->io, IO_IN)) {
> > log_debug("debug: smtp: %p: filter congestion over: 
> > resuming session", s);
> > io_resume(s->io, IO_IN);
> > }
> > 
> > 
> 
> -- 
> Gilles Chehade
> 
> https://www.poolp.org  @poolpOrg
> 



Re: smtpd session hang

2017-06-16 Thread Gilles Chehade
On Fri, Jun 16, 2017 at 01:05:25AM +0300, Henri Kemppainen wrote:
> I had a little debugging session with awolk@ over at #openbsd-daily.  His
> smtpd would over time end up with hung sessions that never timeout.
> 
> The problem is related to the data_io path's congestion control which
> may pause the session.  In this case the io system will not wait for
> read events and as such will not have a chance to timeout until it is
> resumed.
> 
> If the pause happens when a full message is just about to pass through
> the data_io path, the session is never resumed, even though there is
> obviously no more congestion and the session should be reading more
> input from the client again.
> 
> A debug trace excerpt shows the course of events:
> 
> mtp: 0xe54baa1e000: IO_DATAIN  ob=0>
> debug: smtp: 0xe54baa1e000: filter congestion: pausing session
> smtp: 0xe54baa1e000 (data): IO_LOWAT  ob=0>
> debug: smtp: 0xe54baa1e000: data io done (259146 bytes)
> debug: 0xe54baa1e000: end of message, msgflags=0x
> smtp: 0xe54baa1e000: >>> 250 2.0.0: 4f36f9e3 Message accepted for delivery
> smtp: 0xe54baa1e000: STATE_BODY -> STATE_HELO
> smtp: 0xe54baa1e000: IO_LOWAT  ib=0 ob=0>
> 
> From this point on, session 0xe54baa1e000 and its io 0xe551d0d5000
> (which has the pause_in flag) are never seen again in the trace, and
> fstat shows a corresponding connection to smtpd that never goes away.
> 
> The proposed fix is to always resume the session if the data_io path
> hits the low water mark.
> 
> Mr. Wolk tested this diff against smtpd on 6.1 as well as a against
> -current version of smtpd (compiled on the same system running 6.1).
> 

Nice catch, the diff reads fine to me, I'll commit later today when I
have another ok from eric@

side note, smtpd should not have been able to leak more than 5 fd, it
should not have been able to exhaust, is this what you observed ?


> Index: usr.sbin/smtpd/smtp_session.c
> ===
> RCS file: /cvs/src/usr.sbin/smtpd/smtp_session.c,v
> retrieving revision 1.303
> diff -u -p -r1.303 smtp_session.c
> --- usr.sbin/smtpd/smtp_session.c 17 May 2017 14:00:06 -  1.303
> +++ usr.sbin/smtpd/smtp_session.c 15 Jun 2017 20:28:12 -
> @@ -1474,9 +1474,10 @@ smtp_data_io(struct io *io, int evt, voi
>   break;
>  
>   case IO_LOWAT:
> - if (s->tx->dataeom && io_queued(s->tx->oev) == 0) {
> + if (s->tx->dataeom && io_queued(s->tx->oev) == 0)
>   smtp_data_io_done(s);
> - } else if (io_paused(s->io, IO_IN)) {
> +
> + if (io_paused(s->io, IO_IN)) {
>   log_debug("debug: smtp: %p: filter congestion over: 
> resuming session", s);
>   io_resume(s->io, IO_IN);
>   }
> 
> 

-- 
Gilles Chehade

https://www.poolp.org  @poolpOrg