[jira] Updated: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-01-25 Thread mohan_zl (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mohan_zl updated TS-489:


Attachment: code_clean_up.patch

The logic with connection collapsing break the logic of proxy/http and 
iocore/cache, and in actually it does not work very well. 
This patch nuke all the codes with connection collapsing.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
Assignee: mohan_zl
Priority: Critical
 Fix For: 2.1.6

 Attachments: code_clean_up.patch, collapse1.trace, collapse2.trace, 
 TS-489-zym-1.txt, TS-489.patch, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-01-24 Thread Leif Hedstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-489:
-

Assignee: mohan_zl

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
Assignee: mohan_zl
Priority: Critical
 Fix For: 2.1.6

 Attachments: collapse1.trace, collapse2.trace, TS-489-zym-1.txt, 
 TS-489.patch, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-01-22 Thread mohan_zl (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mohan_zl updated TS-489:


Attachment: TS-489.patch

I think there are two tasks for us to fix this problem:
(1) There seems to be something wrong with the logic of ConnectionCollapsing
(2) When cluster is enabled, ConnectionCollapsing in one server will get files 
from another server, but something is wrong with cluster to work it, so the 
cluster can not work well with ConnectionCollapsing
The patch Zhao Yongming  committed fixes the 80 percent of the first problem, 
and i do some code clean up based on it.Next step we will try our best to fix 
the second. 

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.1.6

 Attachments: collapse1.trace, collapse2.trace, TS-489-zym-1.txt, 
 TS-489.patch, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-01-20 Thread Zhao Yongming (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yongming updated TS-489:
-

Attachment: TS-489-zym-1.txt

this Patch will change the schedule usage in collapsing, it will make 
collapsing.rww_wait_time in active. and make the failed hash table insert run 
back into collapsing in the next time, it will not solve the cluster problem, 
but will make collapsing much usable in single server env, and get cluster 
debug much clear.

from what I have learn from the buggy codes, I really want to move collapsing 
out of DNSlookup transaction, why you put a decide function in DNS? and looks 
the author is not familiar with most of the http/threading codes, we really 
need to get it reviewed/reimplemented.

after my patch, I have identified one cluster write bug, it will make 
collapsing evil. still in investing.

and I will leave the collapsing review/re-implement for mohan_zl, it is hard 
for me to write codes than just change some line.

cheers

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.1.6

 Attachments: collapse1.trace, collapse2.trace, TS-489-zym-1.txt, 
 ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-11-10 Thread Leif Hedstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-489:
-

Fix Version/s: (was: 3.1)
   2.1.5

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.1.5

 Attachments: collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-27 Thread mohan_zl (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mohan_zl updated TS-489:


Attachment: ts_489_testing.txt

We did an experiment with ts2.0.1 stable version following the way you setup 
the cluster and changed the relevant arguments, and the results demonstrate 
that we can use the full cluster mode and connection_collapsing feature 
together in ts2.0.1 version. The attach file shows how each change affects the 
results.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.3.0

 Attachments: collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-14 Thread Ricky Chan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricky Chan updated TS-489:
--

Attachment: collapse2.trace
collapse1.trace

Traces generated by:

rename traffic_server to traffic_server.real

Create shell script traffic_server with:

gdb -ex 'r' -ex 'bt' -ex 'q' --args /usr/bin/traffic_server.real $* 
/tmp/ts.trace 21

TS started up via traffic_cop as per normal.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Attachments: collapse1.trace, collapse2.trace


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-14 Thread Leif Hedstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-489:
-

Fix Version/s: 2.3.0

Moving this to v2.3.0, since we don't expect / claim Clustering to be fully 
functional until then.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.3.0

 Attachments: collapse1.trace, collapse2.trace


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.