[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-03-10 Thread Ricky Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005040#comment-13005040
 ] 

Ricky Chan commented on TS-489:
---

For me both of these are not ideal (I actually have read-while-writer already 
enabled) which does what it says but can still flood origins if you ask for 
simultaneous requests for the same object that is un-cached. (because object is 
not know retrieved by any of the connections yet).

I understand your point though, so I'll look at way of re-organise the cache 
hierarchy to mitigate origin hits until better features become available.  

I currently have tweaked the logic to almost never refresh content at all 
because under load (a few thousand hits for same cache object object) it hits 
the origin way to much.

I haven't found a happy medium in the fuzziness (which isn't documented)where 
by it won't hit origins on sudden surges on a cached object and allow 
refreshing on quite periods too.  Also it doesn't help if the object has never 
been seen before.




 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
Assignee: mohan_zl
Priority: Critical
 Fix For: 2.1.6

 Attachments: TS-489-zym-1.txt, TS-489.patch, code_clean_up.patch, 
 collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-03-09 Thread Ricky Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004548#comment-13004548
 ] 

Ricky Chan commented on TS-489:
---

Does this mean Connection collapsing will no longer be supported by Traffic 
Server in any future releases?
This is a core feature I use to reduce the number of hits to the origin server.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
Assignee: mohan_zl
Priority: Critical
 Fix For: 2.1.6

 Attachments: TS-489-zym-1.txt, TS-489.patch, code_clean_up.patch, 
 collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-03-09 Thread Leif Hedstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004565#comment-13004565
 ] 

Leif Hedstrom commented on TS-489:
--

Read-while-writer accomplishes the same, and actually works. And also take a 
look at the fuzzy logic which will allow refreshing content before it goes 
stale.

That much said, future versions will have even better features / support for 
this.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
Assignee: mohan_zl
Priority: Critical
 Fix For: 2.1.6

 Attachments: TS-489-zym-1.txt, TS-489.patch, code_clean_up.patch, 
 collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-02-08 Thread mohan_zl (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992262#comment-12992262
 ] 

mohan_zl commented on TS-489:
-

I'm sorry, i think that is not a bug in the old code, just my fault when moving 
codes. 

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
Assignee: mohan_zl
Priority: Critical
 Fix For: 2.1.6

 Attachments: TS-489-zym-1.txt, TS-489.patch, code_clean_up.patch, 
 collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2011-01-14 Thread Ricky Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981770#action_12981770
 ] 

Ricky Chan commented on TS-489:
---

I should have time to re-run the tests first with 2.0.1 then with 2.1.5 at the 
end of the month if that's okay.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.1.6

 Attachments: collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-27 Thread Ricky Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925351#action_12925351
 ] 

Ricky Chan commented on TS-489:
---

I'll retest with 2.0.1, although I did do a differ between 2.0.0 and 2.0.1 when 
I first encounter this issue and didn't see any IMO code changes which was 
related to this. I just saw mianly the 3 fixes mentioned in the release fix.

Personally I'll repeat the test with -c 20,000 and have it run for a long 
period.

Also keep an eye on the origin, as a tell tale sign is a sudden rush of 1000;s 
of direct connections to the origin.

I'll retest with 2.0.1 with my setup and I'll get back to you.


Ricky

p.s. Zhao, I'm happy to later versions when I get a chance to.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.3.0

 Attachments: collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-27 Thread Zhao Yongming (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925360#action_12925360
 ] 

Zhao Yongming commented on TS-489:
--

cool
Mohan and me have setup up a testing according your direction on v2.0.1, but we 
don't find  the problem you got.

btw, the cluster function may not works due to TS-390 in trunk.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.3.0

 Attachments: collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-27 Thread Ricky Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925381#action_12925381
 ] 

Ricky Chan commented on TS-489:
---

Okay I did a quick re-run of my tests user the 2.0.1 stable release.

I ran this for about 30 minutes, and with -c 20,000.

Here is my results:

Firstly it no longer crashes and AB will show successful results,  However 
looking at the origin logs, I can see that cache data is not working as 
expected.  Firstly I get 1,000's of queries to the origins, i.e. collapsing 
caching is no longer obeyed.  Secondly, every few seconds it will make more 
queries even though cache for max-age is set to 86400 and heuristics is set to 
a rate where it shouldn't come back that quickly.

I then examined TS logs, and I can see a load of TCP_MISS's.

I then repeat the test with clustering disabled.  No hits to the origin at all 
(as it has a cached version already),  I then purge the content. from the 
cache, and I see 1 connection to the origin as expect even though I am 
requesting 20,000 copies of it (so collapsing is working as expected).

So these 2 features in 2.0.1 (albeit without crashing) in my tests are not 
working properly, hitting the origin in those volumes are not possible 
acceptable for my downstream origin owners.

Mohan, check you logs (origin and TS) and make sure they have behaved as 
expected, because I suspect not.









 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.3.0

 Attachments: collapse1.trace, collapse2.trace, ts_489_testing.txt


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-15 Thread Ricky Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921285#action_12921285
 ] 

Ricky Chan commented on TS-489:
---

FYI:

https://issues.apache.org/jira/browse/TS-394

Talks more about clustering, and is known to work in 2.0.0 but broken in other 
versions.





 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Fix For: 2.3.0

 Attachments: collapse1.trace, collapse2.trace


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-14 Thread Ricky Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920882#action_12920882
 ] 

Ricky Chan commented on TS-489:
---

Just re-read my submission, need to make 1 point very clear.

Disabling Clustering (i.e. 3) or disabling connection collapsing (i.e. 0), 
traffic server will no longer seg fault.

Many Thanks.




 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Attachments: collapse1.trace, collapse2.trace


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-14 Thread Ricky Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920887#action_12920887
 ] 

Ricky Chan commented on TS-489:
---

An addition side effect is even when it is running (not yet crashed).

The logs indicate 1's of cache miss. On the origin I can then see 1000's of 
connection coming in for the file.  

When I disable clustering (leaving collapsing on), this now behaves as expected 
(1 connection to origin only).

Seems collapsing and clustering do not want to play together.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Attachments: collapse1.trace, collapse2.trace


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

2010-10-14 Thread Leif Hedstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920993#action_12920993
 ] 

Leif Hedstrom commented on TS-489:
--

Couple of thoughts here:

1) I'm fairly certain that clustering never worked at all in v2.0.x. Can you 
run the same tests with v2.1.3 or trunk from SVN?

2) Did you increase the max number of settings? You are encountering the 
problem right around the time I suspect you are hitting the default limits. Not 
saying that's an excuse to segfault, but it'll help debugging.


You can increase the max number of connections in records.config, with

CONFIG proxy.config.net.connections_throttle INT 5


TS should automatically (as root) increase the rlimits accordingly, the 
manual use of ulimit should not be necessary. However, this might be an area 
that has improved / fixed since the v2.0.x release, so again it'd be great to 
try v2.3.1.

 Seg Fault with Connection_Collapsing and clustering enabled.
 

 Key: TS-489
 URL: https://issues.apache.org/jira/browse/TS-489
 Project: Traffic Server
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Debian Lenny.
 2.6.26-2-amd-64
 Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
 64G Memory
Reporter: Ricky Chan
 Attachments: collapse1.trace, collapse2.trace


 Bug is easily reproduced, with the following setup.
 Traffic Server 2.0.0
 Enable Clustering (so you'll need two machine and make sure cluster is 
 actually working) (LOCAL proxy.local.cluster.type INT 1)
 Enable Connection Collapsing (CONFIG 
 proxy.config.connection_collapsing.hashtable_enabled INT 1)
 Other changes to records.config which may or may affect it are changes to 
 heuristics:
 CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
 CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
 CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
 CONFIG proxy.config.http.cache.fuzz.time INT 240
 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.05
 Using a 3rd machine using apache benchmark (ab)  and request with say -n 
 100 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
 time above 8000.  I just fetched a file from origin on lighttpd which had a 
 cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
 file is 9 bytes only.
 Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
 to larger than defaults to be able to run test, I did ulimit -n 100 and 
 had sysctl -w net.ipv4.ip_local_port_range=1024 65000 to be able to run AB.
 Disabling clustering or connection Collapsing the program no longer.
 I then added GDB wrapper around traffic_server and it clearly shows it's the 
 connection collapsing API which is at fault here.
 I'll add these traces as attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.