Re:Re: TrafficServer 6, keep-alive, connection retries, and 502 Server Hangups

Esmq Sun, 06 Dec 2015 19:33:07 -0800

i had encountered the same problem ~
and i tring to disabled OS keepalived to relieve the problem。



At 2015-10-07 07:19:26, "Nick Muerdter" <[email protected]> wrote:
>On Tue, Oct 6, 2015, at 04:33 PM, James Peach wrote:
>> 
>> > On Oct 4, 2015, at 9:16 AM, Nick Muerdter <[email protected]> wrote:
>> > 
>> > Hi,
>> > 
>> > I've observed some differences in how TrafficServer 6.0.0 behaves with
>> > connection retrying and outgoing keep-alive connections. I believe the
>> > changes in behavior might be related to this issue:
>> > https://issues.apache.org/jira/browse/TS-3440 However, I wasn't sure if
>> > the new behavior (specifically around keep-alive handling) was
>> > intentional or not, so I thought I'd ping the mailing list.
>> > 
>> > What I'm seeing in 6.0.0 is that if TrafficServer has some backend
>> > keep-alive connections already opened, but then one of the keep-alive
>> > connections is closed, the next request to TrafficServer may generate a
>> > 502 Server Hangup response when attempting to reuse that connection.
>> > Previously, I think TrafficServer was retrying when it encountered a
>> > closed keep-alive connection, but that is no longer the case. So if you
>> > have a backend that might unexpectedly close its open keep-alive
>> > connections, the only way I've found to completely prevent these 502
>> > errors in 6.0.0 is to disable outgoing keepalive
>> > (proxy.config.http.keep_alive_enabled_out and
>> > proxy.config.http.keep_alive_post_out settings).
>> > 
>> > For a slightly more concrete example of what can trigger this, this is
>> > fairly easy to reproduce with the following setup:
>> > 
>> > - TrafficServer is proxying to nginx with outgoing keep-alive
>> > connections enabled (the default).
>> > - Throw a constant stream of requests at TrafficServer.
>> > - While that constant stream of requests is happening, also send a
>> > regular stream of SIGHUP commands to nginx to reload nginx.
>> > - Eventually you'll get some 502 Server Hangup responses from
>> > TrafficServer among your stream of requests.
>> > 
>> > SIGHUPs in nginx should result in zero downtime for new requests, but I
>> > think what's happening is that TrafficServer may fail when an old
>> > keep-alived connection is reused (it's not common, so it depends on the
>> > timing of things and if the connection is from an old nginx worker that
>> > has since been shut down). In TrafficServer 5.3.1 these connection
>> > failures were retried, but in 6.0.0, no retries occur in this case.
>> > 
>> > Here's some debug logs that show the difference in behavior between
>> > 6.0.0 and 5.3.1. Note that differences seem to stem from how each
>> > version eventually handles the "VC_EVENT_EOS" event following
>> > "&HttpSM::state_send_server_request_header, VC_EVENT_WRITE_COMPLETE".
>> > 
>> > 5.3.1:
>> > https://gist.github.com/GUI/0c53a6c4fdc2782b14aa#file-trafficserver_5-3-1-log-L316
>> > 6.0.0:
>> > https://gist.github.com/GUI/0c53a6c4fdc2782b14aa#file-trafficserver_6-0-0-log-L314
>> > 
>> > Interestingly, if I'm understand the log files correctly, it looks like
>> > TraffficServer is reporting an odd empty response from these connections
>> > ("HTTP/0.9 0" in 5.3.1 and "HTTP/1.0 0" in 6.0.0). However, as far as I
>> > can tell from TCP dumps on the system, nginx is not actually sending any
>> > form of response.
>> > 
>> > So my basic question is whether the new behavior in 6.0.0 is correct or
>> > not. Based on the discussion in
>> > https://issues.apache.org/jira/browse/TS-3440 I'm unsure whether 5.3.1
>> > retrying on these closed keep-alive connections was actually safe or
>> > not. In these example cases the backend server isn't sending back any
>> > data (at least as far as I can tell), so from what I understand, it
>> > should be safe to retry. However, I'm not totally sure that this
>> > situation with dead keep-alive connections can properly be distinguished
>> > between other types of hangups or connection errors, so perhaps it isn't
>> > safe.
>> > 
>> > If the 6.0.0 behavior is correct, is disabling outgoing keep-alive
>> > connections the best option if I'm worried about backend services
>> > unexpectedly killing off old keep-alive connections? Or is this a bug
>> > with 6.0.0, and should TrafficServer retires technically be possible in
>> > these cases?
>> 
>> Hi Nick,
>> 
>> This sounds like a 6.0 regression to me. Can you file the above
>> information in Jira?
>> 
>> thanks,
>> James
>
>Thanks for the sanity check! I've filed an issue:
>https://issues.apache.org/jira/browse/TS-3959

Re:Re: TrafficServer 6, keep-alive, connection retries, and 502 Server Hangups

Reply via email to