Hi,
I have a question deep in the annals of Parent Select code. Once in a while we 
see some interesting logs from our customer deployment that we can’t explain.

Configuration details:
Parent.config:  Two origin servers are specified in the “parent” parameter (say 
origin1.com<http://origin1.com> and origin2.com<http://origin2.com>), the 
parent selection algorithm (i.e. “round_robin” parameter) is set to None
proxy.config.http.parent_proxy.connect_attempts_timeout INT 5
proxy.config.http.parent_proxy.fail_threshold INT 10
proxy.config.http.parent_proxy.per_parent_connect_attempts INT 2
proxy.config.http.parent_proxy.retry_time INT 300
proxy.config.http.parent_proxy.total_connect_attempts INT 4

Version is 5.3.x

Here is snipped output from diags .log from the mid cache:
...
[Oct  7 03:21:13.396] Server {0x2acd40807700} NOTE: Failure threshold met, http 
parent proxy origin1.com<http://origin1.com>:80 marked down
[Oct  7 03:21:13.468] Server {0x2acd40504700} NOTE: http parent proxy 
origin1.com<http://origin1.com>:80 restored
[Oct  7 03:21:13.507] Server {0x2acd4100f700} NOTE: Parent initially marked as 
down origin1.com<http://origin1.com>:80
[Oct  7 03:21:13.511] Server {0x2acd4100f700} NOTE: http parent proxy 
origin1.com<http://origin1.com>::80 restored
[Oct  7 03:21:13.875] Server {0x2acd3bd46700} NOTE: Parent initially marked as 
down origin1.com<http://origin1.com>::80
[Oct  7 03:21:14.745] Server {0x2acd40403700} NOTE: http parent proxy 
origin1.com<http://origin1.com>::80 restored
[Oct  7 03:21:15.105] Server {0x2acd41110700} NOTE: Parent initially marked as 
down origin1.com<http://origin1.com>::80
[Oct  7 03:21:16.083] Server {0x2acd41211700} NOTE: http parent proxy 
origin1.com<http://origin1.com>::80 restored
[Oct  7 03:21:16.386] Server {0x2acd40807700} NOTE: Parent initially marked as 
down origin1.com<http://origin1.com>::80

(Using dummy Origin Server URL for security reasons)

The perplexing part is the very quick sequence of  “parent initially marked” 
followed by “restored” log lines, without any “Failure threshold met…” between 
them.
Without going into details, but based on our understanding of the parent 
selection algorithm/code, this should not happen -- at least normally.

Any idea what could cause this behavior?

thanks,
Kapil


Reply via email to