[
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mohan_zl updated TS-489:
Attachment: TS-489.patch
I think there are two tasks for us to fix this problem:
(1) There seems to be something wrong with the logic of ConnectionCollapsing
(2) When cluster is enabled, ConnectionCollapsing in one server will get files
from another server, but something is wrong with cluster to work it, so the
cluster can not work well with ConnectionCollapsing
The patch Zhao Yongming committed fixes the 80 percent of the first problem,
and i do some code clean up based on it.Next step we will try our best to fix
the second.
Seg Fault with Connection_Collapsing and clustering enabled.
Key: TS-489
URL: https://issues.apache.org/jira/browse/TS-489
Project: Traffic Server
Issue Type: Bug
Affects Versions: 2.0.0
Environment: Debian Lenny.
2.6.26-2-amd-64
Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
64G Memory
Reporter: Ricky Chan
Fix For: 2.1.6
Attachments: collapse1.trace, collapse2.trace, TS-489-zym-1.txt,
TS-489.patch, ts_489_testing.txt
Bug is easily reproduced, with the following setup.
Traffic Server 2.0.0
Enable Clustering (so you'll need two machine and make sure cluster is
actually working) (LOCAL proxy.local.cluster.type INT 1)
Enable Connection Collapsing (CONFIG
proxy.config.connection_collapsing.hashtable_enabled INT 1)
Other changes to records.config which may or may affect it are changes to
heuristics:
CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
CONFIG proxy.config.http.cache.fuzz.time INT 240
CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
Using a 3rd machine using apache benchmark (ab) and request with say -n
100 with keep alive (-k) and -c 8000 say. I found it happens all the
time above 8000. I just fetched a file from origin on lighttpd which had a
cache-control header of max-age 86400, so to reduce hitting origin. Size of
file is 9 bytes only.
Note: You need to set ulimit -n very high and set sysctl ip_local_port_range
to larger than defaults to be able to run test, I did ulimit -n 100 and
had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
Disabling clustering or connection Collapsing the program no longer.
I then added GDB wrapper around traffic_server and it clearly shows it's the
connection collapsing API which is at fault here.
I'll add these traces as attachments.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.