Hello,
I am using TrafficServer which has only one origin - haproxy - and which
serves as a cache for many sites.
Details are:
Version of Traffic Server used: 7.1.12 (also applies to 8.1.0)
Platform: Linux 64 bit, gcc 8.3.0
Any relevant configuration changes you've made from the default
configurations (particularly for records.config), part of `traffic_ctl
diff`:
proxy.config.http.cache.open_write_fail_action has changed
Current Value : 2
Default Value : 0
proxy.config.http.negative_revalidating_enabled has changed
Current Value : 1
Default Value : 0
proxy.config.http.negative_revalidating_lifetime has changed
Current Value : 86400
Default Value : 1800
proxy.config.http.insert_client_ip has changed
Current Value : 0
Default Value : 1
proxy.config.http.insert_squid_x_forwarded_for has changed
Current Value : 0
Default Value : 1
proxy.config.http.transaction_no_activity_timeout_in has changed
Current Value : 600
Default Value : 30
proxy.config.http.transaction_no_activity_timeout_out has changed
Current Value : 600
Default Value : 30
proxy.config.http.connect_attempts_max_retries has changed
Current Value : 0
Default Value : 3
proxy.config.http.connect_attempts_max_retries_dead_server has changed
Current Value : 0
Default Value : 1
proxy.config.http.connect_attempts_timeout has changed
Current Value : 600
Default Value : 30
proxy.config.http.post_connect_attempts_timeout has changed
Current Value : 600
Default Value : 1800
proxy.config.http.normalize_ae_gzip has changed
Current Value : 0
Default Value : 1
proxy.config.cache.ram_cache.size has changed
Current Value : 1073741824
Default Value : -1
proxy.config.url_remap.pristine_host_hdr has changed
Current Value : 1
Default Value : 0
I have trafficserver connecting to origin (haproxy) on the same host with:
cat etc/trafficserver/remap.config
map /HTTPS/ http://10.0.251.170:21443
map / http://10.0.251.170:41080
There is constant stream of requests going to trafficserver. About 0.5% of
them fails with returning 502 to the client, with entry in
var/log/trafficserver/error.log:
20210107.13h02m15s CONNECT: could not connect to 10.0.251.170 for '
http://10.0.251.170:41080/path/' (setting last failure time)
20210107.13h02m15s RESPONSE: sent 10.0.251.170 status 502 (Server Hangup)
for 'http://10.0.251.170:41080/path/'
also seen in var/log/trafficserver/squid.log:
1610020935.756 43 10.0.251.170 TCP_REFRESH_FAIL_HIT/502 498 GET
http://10.0.251.170:41080/path/ - DIRECT/10.0.251.170 text/html
Thanks to enabling diagnostics with:
CONFIG proxy.config.diags.debug.enabled INT 1
CONFIG proxy.config.diags.debug.tags STRING http.*
I saw in var/log/trafficserver/traffic.out:
[Jan 7 13:02:15.757] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:2666
(main_handler)> (http) [5687] [HttpSM::main_handler,
VC_EVENT_WRITE_COMPLETE]
[Jan 7 13:02:15.757] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1994
(state_send_server_request_header)> (http) [5687]
[&HttpSM::state_send_server_request_header, VC_
EVENT_WRITE_COMPLETE]
[Jan 7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:2666
(main_handler)> (http) [5687] [HttpSM::main_handler, VC_EVENT_EOS]
[Jan 7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1836
(state_read_server_response_header)> (http) [5687]
[&HttpSM::state_read_server_response_header, V
C_EVENT_EOS]
[Jan 7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1923
(state_read_server_response_header)> (http_seq) Error parsing server
response header
[Jan 7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:5513
(handle_server_setup_error)> (http) [5687]
[&HttpSM::handle_server_setup_error, VC_EVENT_EOS]
[Jan 7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3394
(HandleResponse)> (http_trans) [5687] [HttpTransact::HandleResponse]
[Jan 7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3395
(HandleResponse)> (http_seq) [5687] [HttpTransact::HandleResponse] Response
received
[Jan 7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:8497
(ink_cluster_time)> (http_trans) [ink_cluster_time] local: 1610020935,
highest_delta: 0, cl
uster: 1610020935
[Jan 7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3402
(HandleResponse)> (http_trans) [5687] [HandleResponse]
response_received_time: 1610020935
+++++++++ Incoming O.S. Response +++++++++
-- State Machine Id: 5687
HTTP/1.0 0
I clearly see that setting
proxy.config.http.send_http11_requests INT 0
Drops the amount of problems to almost 0 (but they still appear).
As a workaround I used (with keeping HTTP/1.1):
CONFIG proxy.config.http.connect_attempts_max_retries INT 3
CONFIG proxy.config.http.connect_attempts_max_retries_dead_server INT 1
Then my client is never served with 502 in such case, and with
TrafficServer 7 there is nothing in the var/log/trafficserver/error.log,
while using TrafficServer 8 there is:
20210111.14h30m34s CONNECT:[0] could not connect [CONNECTION_CLOSED] to
10.0.251.170 for 'http://10.0.251.170:41080/'
And well, TrafficServer just reconnects to the origin.
My questions are:
* Is it a possible bug in TrafficServer (somewhat similar to
https://issues.apache.org/jira/browse/TS-3959)?
* Is it misconfiguration of TrafficServer?
* Is there all ok with TrafficServer and there is really a problem in my
origin software?
Thanks for the tips,
Regards,
Łukasz