i had encountered the same problem ~ and i tring to disabled OS keepalived to relieve the problem。
At 2015-10-07 07:19:26, "Nick Muerdter" <[email protected]> wrote: >On Tue, Oct 6, 2015, at 04:33 PM, James Peach wrote: >> >> > On Oct 4, 2015, at 9:16 AM, Nick Muerdter <[email protected]> wrote: >> > >> > Hi, >> > >> > I've observed some differences in how TrafficServer 6.0.0 behaves with >> > connection retrying and outgoing keep-alive connections. I believe the >> > changes in behavior might be related to this issue: >> > https://issues.apache.org/jira/browse/TS-3440 However, I wasn't sure if >> > the new behavior (specifically around keep-alive handling) was >> > intentional or not, so I thought I'd ping the mailing list. >> > >> > What I'm seeing in 6.0.0 is that if TrafficServer has some backend >> > keep-alive connections already opened, but then one of the keep-alive >> > connections is closed, the next request to TrafficServer may generate a >> > 502 Server Hangup response when attempting to reuse that connection. >> > Previously, I think TrafficServer was retrying when it encountered a >> > closed keep-alive connection, but that is no longer the case. So if you >> > have a backend that might unexpectedly close its open keep-alive >> > connections, the only way I've found to completely prevent these 502 >> > errors in 6.0.0 is to disable outgoing keepalive >> > (proxy.config.http.keep_alive_enabled_out and >> > proxy.config.http.keep_alive_post_out settings). >> > >> > For a slightly more concrete example of what can trigger this, this is >> > fairly easy to reproduce with the following setup: >> > >> > - TrafficServer is proxying to nginx with outgoing keep-alive >> > connections enabled (the default). >> > - Throw a constant stream of requests at TrafficServer. >> > - While that constant stream of requests is happening, also send a >> > regular stream of SIGHUP commands to nginx to reload nginx. >> > - Eventually you'll get some 502 Server Hangup responses from >> > TrafficServer among your stream of requests. >> > >> > SIGHUPs in nginx should result in zero downtime for new requests, but I >> > think what's happening is that TrafficServer may fail when an old >> > keep-alived connection is reused (it's not common, so it depends on the >> > timing of things and if the connection is from an old nginx worker that >> > has since been shut down). In TrafficServer 5.3.1 these connection >> > failures were retried, but in 6.0.0, no retries occur in this case. >> > >> > Here's some debug logs that show the difference in behavior between >> > 6.0.0 and 5.3.1. Note that differences seem to stem from how each >> > version eventually handles the "VC_EVENT_EOS" event following >> > "&HttpSM::state_send_server_request_header, VC_EVENT_WRITE_COMPLETE". >> > >> > 5.3.1: >> > https://gist.github.com/GUI/0c53a6c4fdc2782b14aa#file-trafficserver_5-3-1-log-L316 >> > 6.0.0: >> > https://gist.github.com/GUI/0c53a6c4fdc2782b14aa#file-trafficserver_6-0-0-log-L314 >> > >> > Interestingly, if I'm understand the log files correctly, it looks like >> > TraffficServer is reporting an odd empty response from these connections >> > ("HTTP/0.9 0" in 5.3.1 and "HTTP/1.0 0" in 6.0.0). However, as far as I >> > can tell from TCP dumps on the system, nginx is not actually sending any >> > form of response. >> > >> > So my basic question is whether the new behavior in 6.0.0 is correct or >> > not. Based on the discussion in >> > https://issues.apache.org/jira/browse/TS-3440 I'm unsure whether 5.3.1 >> > retrying on these closed keep-alive connections was actually safe or >> > not. In these example cases the backend server isn't sending back any >> > data (at least as far as I can tell), so from what I understand, it >> > should be safe to retry. However, I'm not totally sure that this >> > situation with dead keep-alive connections can properly be distinguished >> > between other types of hangups or connection errors, so perhaps it isn't >> > safe. >> > >> > If the 6.0.0 behavior is correct, is disabling outgoing keep-alive >> > connections the best option if I'm worried about backend services >> > unexpectedly killing off old keep-alive connections? Or is this a bug >> > with 6.0.0, and should TrafficServer retires technically be possible in >> > these cases? >> >> Hi Nick, >> >> This sounds like a 6.0 regression to me. Can you file the above >> information in Jira? >> >> thanks, >> James > >Thanks for the sanity check! I've filed an issue: >https://issues.apache.org/jira/browse/TS-3959
