[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393049#comment-14393049 ] Leif Hedstrom commented on TS-1405: --- Should we revisit this again? Anyone on this Jira interested, and if so, can you provide a patch that applies to current master branch? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Labels: A, C, review Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v12.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch, patch12_test.pdf when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922216#comment-13922216 ] Leif Hedstrom commented on TS-1405: --- Let me do another round of benchmarks, to make sure the latency issues are addressed. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Labels: A, C, review Fix For: 5.2.0 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v12.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch, patch12_test.pdf when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917710#comment-13917710 ] Bin Chen commented on TS-1405: -- the origin event schedule policy will schedule the event(event-timeout_at - now 5ms).now new patch(v12) will schedule these event too. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Labels: C Fix For: 5.2.0 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v12.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch, patch12_test.pdf when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678775#comment-13678775 ] Leif Hedstrom commented on TS-1405: --- Just to make sure I wasn't affecting by the accept-thread (well, non-accept-thread) issue, I reran the tests, with accept-thread enabled. I still see similar poor performance with the last patch here: Current master: {code} 4794208 fetches on 4946 conns, 300 max parallel, 4.794208E+08 bytes, in 30 seconds 100 mean bytes/fetch 159806.9 fetches/sec, 1.598069E+07 bytes/sec msecs/connect: 0.536 mean, 3.346 max, 0.087 min msecs/first-response: 1.670 mean, 247.579 max, 0.097 min {code} With time-wheel patch: {code} http_load -parallel 60 -seconds 30 -keep_alive 1000 URL.small 3238265 fetches on 3354 conns, 300 max parallel, 3.238265E+08 bytes, in 30 seconds 100 mean bytes/fetch 107942.2 fetches/sec, 1.079422E+07 bytes/sec msecs/connect: 0.290 mean, 2.753 max, 0.076 min msecs/first-response: 2.689 mean, 76.218 max, 0.084 min {code} I could probably deal with the fact that it's lower throughput, but 33% fewer requests handled, and still 60% higher latency? I'll try to investigate this further next week, I'd really like to see the scalability improvements, but not at this significant loss of throughput / performance for use cases with few connections. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.5 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678784#comment-13678784 ] Leif Hedstrom commented on TS-1405: --- So, probing around a little more, it's definitely some sort of resource contention I think. If I increase the number of active, concurrent clients to 1000, I get the throughput I expect (actually, it does slightly more, almost 170,000 QPS), but latency is much, much worse (about 5ms vs 1.7ms). What do you guys think? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.5 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631562#comment-13631562 ] Bin Chen commented on TS-1405: -- url.txt: total 50 urls. the last number is the size of object. {code} http://ts.cn:8080/ts/20400 http://ts.cn:8080/ts/20401 http://ts.cn:8080/ts/20402 .. http://ts.cn:8080/ts/20448 http://ts.cn:8080/ts/20449 {code} apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631560#comment-13631560 ] Bin Chen commented on TS-1405: -- http_load test: Very important: * disabe ram cache (If not, test may be not in control). * should run more time http_load, becasue of the first result may be not precise git master: {code} [root@test69 ~]# http_load -parallel 100 -seconds 60 -keep_alive 100 ./url.txt 331224 fetches on 3325 conns, 100 max parallel, 6.76508e+09 bytes, in 60 seconds 20424.5 mean bytes/fetch 5520.4 fetches/sec, 1.12751e+08 bytes/sec msecs/connect: 0.134652 mean, 0.846 max, 0.049 min msecs/first-response: 16.4409 mean, 89.136 max, 0.3 min HTTP response codes: code 200 -- 331224 {code} git master + linux_time_wheel_v11jp.patch {code} [root@test69 ~]# http_load -parallel 100 -seconds 60 -keep_alive 100 ./url.txt 339305 fetches on 3408 conns, 100 max parallel, 6.93014e+09 bytes, in 60 seconds 20424.5 mean bytes/fetch 5655.08 fetches/sec, 1.15502e+08 bytes/sec msecs/connect: 0.135805 mean, 0.561 max, 0.052 min msecs/first-response: 15.9165 mean, 78.146 max, 0.28 min HTTP response codes: code 200 -- 339305 {code} apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631746#comment-13631746 ] Leif Hedstrom commented on TS-1405: --- First, I've run the tests at both 1 minute and 5 minute duration, it makes no difference whatsoever for either patches. How long do I have to run it for? Your second comment of disabling RAM cache makes no sense. I'm not trying to test the disk I/O, I'm testing that we don't have regression in processing of simple requests. Finally, the test you are running it nothing similar to what I do. You have to change 1) 100 bytes / object (or something smallish, such that each response at least fits in one TCP segment). I run 100 bytes, because otherwise I become GigE NIC bound (and that's not what I'm testing). 2) run at least 3x instances of http_load at the same time. That gives a total of 300 clients. You should run enough clients such that the ATS box gets bottlnecked on some resource (CPU in my case). 3) Your hardware is probably way, way (I mean, *way*) more powerful than what I have. I have a quad core i7 with no NUMA (so single socket) and only 1x GigE network. I'll run some more tests if you have any ideas of what I should change (other than turning of RAM cache, that's just not useful). I have one idea of what could be causing this, so I'll do some more tests today. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631977#comment-13631977 ] Leif Hedstrom commented on TS-1405: --- Couple of more metrics, related to CPU usage and context switching: Without the patch, doing a 60s run, I see {code} CPU USER NICE SYS IDLE Wait IRQ SIRQ - -- -- -- -- -- -- -- cpu445.5% 0.0% 161.5% 104.4% 0.1%12.7%73.6% cpu0 0.9% 0.0%17.6% 0.0% 0.0% 9.6%71.6% cpu161.7% 0.0%20.1%17.1% 0.1% 0.5% 0.3% cpu263.0% 0.0%20.5%15.5% 0.0% 0.4% 0.3% cpu364.5% 0.0%21.3%13.2% 0.0% 0.5% 0.3% cpu467.2% 0.0%22.4% 9.5% 0.0% 0.4% 0.3% cpu565.8% 0.0%21.4%11.9% 0.0% 0.5% 0.3% cpu662.4% 0.0%20.5%16.1% 0.0% 0.4% 0.3% cpu760.1% 0.0%17.9%21.2% 0.0% 0.3% 0.3% Context switches/sec: 13225.8 Interrupts/sec: 51887.3 {code} With the patch, again a 60s run, I see {code} CPU USER NICE SYS IDLE Wait IRQ SIRQ - -- -- -- -- -- -- -- cpu277.8% 0.0% 135.0% 305.6% 0.2%14.9%60.7% cpu0 7.0% 0.0%22.3% 0.0% 0.0%12.2%58.5% cpu132.8% 0.0%14.1%51.2% 0.1% 0.4% 0.3% cpu234.2% 0.0%14.7%49.6% 0.0% 0.4% 0.3% cpu333.1% 0.0%14.3%50.9% 0.0% 0.4% 0.3% cpu440.2% 0.0%17.0%41.3% 0.0% 0.4% 0.3% cpu539.3% 0.0%16.6%42.7% 0.0% 0.4% 0.3% cpu638.9% 0.0%16.5%43.1% 0.0% 0.4% 0.3% cpu752.4% 0.0%19.4%26.7% 0.0% 0.3% 0.3% Context switches/sec: 37947.0 Interrupts/sec: 64170.1 {code} Besides not being able to use as much of the CPU with the patch, also notice the 3x increase in context switches. That is probably part of the problem I think. I am aware that my box is pinning core 0 with IRQs, I honestly don't know why my modern linux / FC distribution isn't balancing the IRQs. But it's the same problem for both runs. :) apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631983#comment-13631983 ] Leif Hedstrom commented on TS-1405: --- The script I run to measure this during a run is available at https://github.com/zwoop/scripts/blob/master/procdelta.py . For running http_load, I use a very simple wrapper around, like {code} #!/bin/sh PAR=${1:-100} TIME=${2:-10} KA=${3:-100} URL=${4:-URL.small} echo http_load -parallel $PAR -seconds $TIME -keep_alive $KA $URL rm -f /tmp/ONE /tmp/TWO /tmp/THREE http_load -parallel $PAR -seconds $TIME -keep_alive $KA $URL /tmp/ONE http_load -parallel $PAR -seconds $TIME -keep_alive $KA $URL /tmp/TWO http_load -parallel $PAR -seconds $TIME -keep_alive $KA $URL /tmp/THREE sleep 3 merge_stats.pl /tmp/ONE /tmp/TWO /tmp/THREE {code} I run it like {code} $ benchit.sh 100 300 100 /tmp/URL {code} The URL is a 100 byte body, I have to do it that small to not become NIC bound. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631087#comment-13631087 ] Zhao Yongming commented on TS-1405: --- after I have some stress testing in our testing lab, with jtest the box is 24 core(logically), with dule-10GE nic, all ssd system each TS set to 20ET_NET threads and 4 client stressing one server, on each client, I start 6 jtest with: {code} screen jtest -P xxx.xxx.xxx.xxx -S ts.cn -s 9990 -z 0.99 -D xxx -c 30 {code} that is a 99% hit condition and the numbers show that in full load, the patch may provide better performance in response time(12% less) but used a litter more cpu(2% more) the current master: {code} Time ---cpu-- ts-- Time user syswaithirqsirqutil qpscons Bps rt rpc 14/04/13-00:21 23.81 10.660.360.00 34.06 68.5392.0K 18.4K 1.3G3.385.00 14/04/13-00:22 23.72 10.610.360.00 34.18 68.5092.0K 18.4K 1.3G3.375.00 14/04/13-00:23 23.55 10.600.310.00 33.65 67.8091.0K 18.2K 1.3G3.445.00 14/04/13-00:24 23.72 10.680.350.00 34.13 68.5392.1K 18.4K 1.3G3.355.00 14/04/13-00:25 23.75 10.680.350.00 34.11 68.5592.2K 18.4K 1.3G3.335.00 14/04/13-00:26 23.51 10.530.330.00 33.66 67.7190.9K 18.2K 1.3G3.535.00 14/04/13-00:27 23.81 10.630.350.00 34.00 68.4491.9K 18.4K 1.3G3.465.00 14/04/13-00:28 23.79 10.640.360.00 34.04 68.4692.2K 18.4K 1.3G3.375.00 14/04/13-00:29 23.72 10.670.340.00 33.93 68.3291.9K 18.4K 1.3G3.345.00 14/04/13-00:30 23.95 10.630.340.00 34.18 68.7692.5K 18.5K 1.3G3.315.00 {code} the current master with patch version v11jp: {code} Time ---cpu-- ts-- Time user syswaithirqsirqutil qpscons Bps rt rpc 14/04/13-00:22 25.20 10.550.250.00 34.12 69.8791.9K 18.4K 1.3G2.975.00 14/04/13-00:23 25.36 10.590.260.00 34.30 70.2592.4K 18.5K 1.3G2.985.00 14/04/13-00:24 25.51 10.570.260.00 34.23 70.3092.1K 18.4K 1.3G2.995.00 14/04/13-00:25 25.12 10.540.260.00 34.01 69.6691.5K 18.3K 1.3G2.985.00 14/04/13-00:26 25.33 10.570.250.00 34.33 70.2392.2K 18.4K 1.3G2.935.00 14/04/13-00:27 25.40 10.640.260.00 34.16 70.2092.4K 18.5K 1.3G2.945.00 14/04/13-00:28 25.25 10.530.260.00 33.94 69.7291.6K 18.3K 1.3G3.015.00 14/04/13-00:29 25.34 10.630.250.00 34.16 70.1492.4K 18.5K 1.3G2.935.00 14/04/13-00:30 25.41 10.550.260.00 34.21 70.1792.3K 18.5K 1.3G2.965.00 14/04/13-00:31 25.33 10.560.260.00 34.11 69.9991.8K 18.4K 1.3G2.975.00 14/04/13-00:32 25.42 10.620.260.00 34.29 70.3292.3K 18.5K 1.3G2.925.00 {code} apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631112#comment-13631112 ] John Plevyak commented on TS-1405: -- A third drop in performance on any test is a red flag. There is definitely something wrong. There are two things going on in this patch. 1) it replaces the power of 2 buckets with a time wheel and 2) it introduces an atomic list as a mechanism for freeing up events quickly. Perhaps we can test the two separately? In particular, we can remove the atomic list effects by just having Event::cancel_event() call cancel_action() and commenting out the call to process_cancelled_events(). Leif, you up for running your test again with that change? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631209#comment-13631209 ] Leif Hedstrom commented on TS-1405: --- Yeah, of course. Lets get this figured out. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627875#comment-13627875 ] Leif Hedstrom commented on TS-1405: --- I think the max being down is an artifact of less pressure on the box (since it not can only do about 60% of the traffic it used to). I ran a few more tests, the second one tries to reduce the pressure on the box to verify that the max response time is due to the system being on its knees: With this patch, and 500 connections (there's not noticeable difference, other than mean time is 30% worse): {code} 6378965 fetches on 580129 conns, 498 max parallel, 6.378960E+08 bytes, in 60 seconds 100 mean bytes/fetch 106315.6 fetches/sec, 1.063156E+07 bytes/sec msecs/connect: 0.245 mean, 8.846 max, 0.042 min msecs/first-response: 3.791 mean, 207.045 max, 0.079 min {code} Current master with 300 connections, but at a lower QPS (so less pressure): {code} 8850329 fetches on 8 conns, 300 max parallel, 8.850330E+08 bytes, in 60 seconds 100 mean bytes/fetch 147505.5 fetches/sec, 1.475055E+07 bytes/sec msecs/connect: 0.191 mean, 2.037 max, 0.043 min msecs/first-response: 0.678 mean, 77.340 max, 0.085 min {code} So even though this second test on master is doing significantly more QPS (almost 50% more), it still has much better response response times across the board. By reducing the throughput in this last test, such that the system resources aren't at their limits, the response times improve. I think that's why with the patch, you see slightly better response times on Max, but it's really not indicative of the patch improving anything. It's because with the patch, ATS simply can't put the system under pressure. This is pretty much the same problem I posted about early on here. As far as I can tell, it's gotten noticeably worse since the first patch sets :). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628638#comment-13628638 ] Bin Chen commented on TS-1405: -- http_load -parallel 100 -seconds 60 -keep_alive 100 /tmp/URL all /tmp/URL is hit or miss? how about hit ratio? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628664#comment-13628664 ] Leif Hedstrom commented on TS-1405: --- 100% cache hit ratio. Not that I run 3x of those http_load, for a total of 300 connections. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627276#comment-13627276 ] Leif Hedstrom commented on TS-1405: --- linux_time_wheel_v11jp.patch doesn't apply cleanly on master :). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627287#comment-13627287 ] Leif Hedstrom commented on TS-1405: --- I just tested this, and I still have quite dramatic performance problems. See the numbers below. One thing to notice is that with the patch, my box can only use 400% CPU, whereas without it, about 600% (which probably explains some of the difference). Also notice the much, much higher latency even at much lower throughput: Current master: {code} tinkerballa (17:37) 265/0 $ ~/benchit.sh 100 60 100 /tmp/URL http_load -parallel 100 -seconds 60 -keep_alive 100 /tmp/URL 9502994 fetches on 94235 conns, 300 max parallel, 9.502990E+08 bytes, in 60 seconds 100 mean bytes/fetch 158383.2 fetches/sec, 1.583832E+07 bytes/sec msecs/connect: 0.471 mean, 4.960 max, 0.048 min msecs/first-response: 1.263 mean, 385.282 max, 0.102 min {code} With v11jp.patch {code} http_load -parallel 100 -seconds 60 -keep_alive 100 /tmp/URL 6352139 fetches on 63036 conns, 300 max parallel, 6.352130E+08 bytes, in 60 seconds 100 mean bytes/fetch 105869.1 fetches/sec, 1.058691E+07 bytes/sec msecs/connect: 0.181 mean, 7.087 max, 0.049 min msecs/first-response: 2.704 mean, 244.925 max, 0.080 min {code} apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627411#comment-13627411 ] John Plevyak commented on TS-1405: -- Weird. The min and max are down, but the mean is up. What happens when you go to 500 connections? I am wondering if it is an efficiency or a latency issue. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625411#comment-13625411 ] Bin Chen commented on TS-1405: -- Last patch have been running on our ten boxes five days. These boxes run about 5K qps. Maybe we can commit this patch after one week if no problem. How about? John. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625447#comment-13625447 ] John Plevyak commented on TS-1405: -- Sounds good. What sort of CPU/Memory improvements are you seeing? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625460#comment-13625460 ] John Plevyak commented on TS-1405: -- The patch includes: +#if AIO_MODE == AIO_MODE_NATIVE +#define AIO_PERIOD-HRTIME_MSECONDS(4) +#else Even if it was set to zero, on an unloaded system it would only get polled every 10 msecs because that is the poll rate for epoll(), so you could potentially delay a disk IO by that amount of time. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625492#comment-13625492 ] taorui commented on TS-1405: On 04/08/2013 11:27 PM, John Plevyak (JIRA) wrote: yes, on an unloaded system, there exists the problem you have mentioned. should we add a trigger mechanism to wake up the thread from epoll_wait for disk io event ? I chose the scheme for it is easy-implemented. [ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625460#comment-13625460 ] John Plevyak commented on TS-1405: -- The patch includes: +#if AIO_MODE == AIO_MODE_NATIVE +#define AIO_PERIOD-HRTIME_MSECONDS(4) +#else Even if it was set to zero, on an unloaded system it would only get polled every 10 msecs because that is the poll rate for epoll(), so you could potentially delay a disk IO by that amount of time. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625510#comment-13625510 ] John Plevyak commented on TS-1405: -- Perhaps this is a larger issue. We use eventfd to wake up the event thread on an unloaded system, but it would be best to avoid using it when the system becomes loaded as it is expensive and tends to cause spinning on moderately loaded systems. Perhaps instead we should have operational regimes: use blocking IO threads on an unloaded or lightly loaded system and switching to AIO as the system becomes more heavily loaded. I would also be interested to see how this interacts with SSDs which can have wait times in the micro-second range. The crossover point for an SSD system is likely different than for an HDD system. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626145#comment-13626145 ] Bin Chen commented on TS-1405: -- test box: Cluster(cluster_type == 1) 10*Cache Server: CPU:Intel(R) Xeon(R) CPU L5630 @ 2.13GHz Ram:MemTotal: 49416984 kB Interface: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Total Throughput: 8Gbps apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626166#comment-13626166 ] Leif Hedstrom commented on TS-1405: --- I'll try to do some benchmarks tomorrow morning. Which patch is the one that would be committed ? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626171#comment-13626171 ] Bin Chen commented on TS-1405: -- linux_time_wheel_v11jp.patch apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619345#comment-13619345 ] James Peach commented on TS-1405: - I started trying to review this. It would be really helpful to have some comments around PriorityEventQueue, particularly the various constants. I'll spend more time on this tomorrow; maybe it will become clearer as I read more ;) apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619536#comment-13619536 ] Bin Chen commented on TS-1405: -- yeah, i should read more carefully. But i can test this patch first. Thanks John. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618129#comment-13618129 ] John Plevyak commented on TS-1405: -- I missed on case, fixed in v11. I agree that you won't see the race if the timeout (50msec) is sufficiently large and no thread fails to be rescheduled and run in that amount of time, but I think such timing dependent behavior is to be avoided if possible. We have have a couple other races of this type, uses of new_Freer() and flushing of the log buffers but the former use a much larger timeout (1 minute) while the latter may be a cause of occasional crashes which we have not been able to debug for years. Experiences with the log buffer flushing issue are why I am not happy with a race in the event code. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617469#comment-13617469 ] Bin Chen commented on TS-1405: -- if event in EThrad::process_event() after MUTEX_RELEASE(lock), event is setted cancel flag. then event will free in process_event. Event will be pushed to atomic_list. So event is freeed, but already in atomic list. Use atomic_list to process cancel flag because of reclaiming cancelled event more quick(will use less memory). If we only handle event in_the_priority_queue and recleaim event which not process immediately(eg:event-time_at 50ms). the design will be simple and the mem using will be aceeptable. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617873#comment-13617873 ] John Plevyak commented on TS-1405: -- No, in the current patch (v10) in process_event the event will only be free'd if cancelled is set to CANCEL_SET which means that the Event is not in the atomic_list. The current v10 patch is simple, fast and has no delay and hence no opportunity for timing related problems. The previous patch checks Event::in_the_priority_queue which can change state at any time when the Event::ethread != this_ethread(). This is a race, and as a result the state of the Event being on the atomic_list is not knowable in the EThread during ::execute(). This will result in crashes. You may not be seeing them because we typically pin all transactions to a single thread unless proxy.config.share_server_session is set to 1, so Event::ethread == this_ethread(), however that is not the case in general. Try testing with this and the appropriate configuration and you will see the problem. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617877#comment-13617877 ] John Plevyak commented on TS-1405: -- If anyone else would like to chime in, I would appreciate it. Race conditions are subtle and when they exist, lead to random crashes which are very difficult to debug, so I would like to be sure that we are not introducing any races with this change. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616342#comment-13616342 ] John Plevyak commented on TS-1405: -- I am still concerned about race conditions with the v9 patch. In particular when the cancelled flag is set is possible (but not certain) that the event will be in the atomic list. If it is, then it should not be free'd, but if it is not it should be. Doing the wrong thing is either a leak or memory corruption. Furthermore, if we are cancelling from a different thread than the one the Event is on, the in_the_priority_queue flag is racy (it may change at any time) and hence should not be relied upon. Attached please find v10. This patch converts the 'cancelled' flag into a multi-state variable which captures whether or not the Event is in the atomic list. All tests of the cancelled variable now do the right thing with respect to the state of the event. Bin Chen: please take a look at this patch and consider the possible races and tell me what you think. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614895#comment-13614895 ] John Plevyak commented on TS-1405: -- I have uploaded a small modification on the recent v8 patch. This modification removes the delay, fixes a memory leak (of Mutex) and avoids going through the atomic list if we are on the same thread (the typical case). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614919#comment-13614919 ] Bin Chen commented on TS-1405: -- thank you very much apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612083#comment-13612083 ] Bin Chen commented on TS-1405: -- I strongly agree this advice. If events using is not corrent, we fix. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611638#comment-13611638 ] John Plevyak commented on TS-1405: -- +TS_INLINE void +Event::cancel_event(Continuation * c) +{ + if (!cancelled) { +ink_assert(!c || c == continuation); +ethread-set_event_cancel(this); +cancelled = true; + } +} Once set_event_cancel has run, the Event may be deleted at any time. Do not set the cancelled flag here. It is set in set_cancel_event() in any case. If you set it here you can overwrite free memory (or worse a another event). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611724#comment-13611724 ] Bin Chen commented on TS-1405: -- not all event will setted cancelled flag by set_event_cancel. So we should set cancelled = true. Some event will be setted twice in cancel_event(). because we delay free event, so twice setting may no problem. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611749#comment-13611749 ] John Plevyak commented on TS-1405: -- I think depending on the delay is brittle. You can never tell how long a thread will be delayed in an overloaded system, and the delay increases memory pressure. Rather I would remove the delay, moving the line + event_cancel_list_head = (Event *) ink_atomiclist_popall(event_cancel_list); above the loop in process_cancel_event() (and remove the time test). Then I would move the assignment of cancelled = true into set_event_cancel: if (!e-canceled) { if (e-in_the_priority_queue (e-timeout_at - e-ethread-cur_time) HRTIME_SECONDS(event_cancel_limit)) { /* prevent more threads cancel one event racing */ e-cancelled = true; ink_atomiclist_push(event_cancel_list, e); } else e-cancelled = true; } In fact, I would just incorporate the code in set_event_cancel into cancel_event() since it is only called in one place. So, I agree, that the delay would most likely have prevented a problem, but I think it would be better to not have it, because when future programmers see a constant delay, they might be tempted to decrease it to the point when problems might occur. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609122#comment-13609122 ] John Plevyak commented on TS-1405: -- Why is it segfaulting? Can we backout the commit(s) which which caused the problem? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609204#comment-13609204 ] James Peach commented on TS-1405: - I just disabled TS-1742, which was causing freelist segfaults. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607432#comment-13607432 ] Bin Chen commented on TS-1405: -- 1. replace 4s to 20ms: we should pretect some cancelled event not processing(process_event) when some cont reference it. 2. i use event_cancel_list_head switching to pretect the cancelled event only free after delay(20ms) apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607436#comment-13607436 ] Bin Chen commented on TS-1405: -- Should ts have no race in Event::cancel_event? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608136#comment-13608136 ] John Plevyak commented on TS-1405: -- If everything is correct there should be no race. You shouldn't be setting the 'cancelled' flag in cancel_event() since it is set in set_cancelled_event. Remove the ink_release_assert(). We should not have any of these: they slow the code down and lead to crash storms which are bad for everyone. There is no race because the caller needs to be holding the mutex, and after the call to cancel_event() the event is considered dead (which is why you shouldn't be setting the cancelled flag AFTER inserting the event into the cancel atomic list, because that is a race). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608142#comment-13608142 ] John Plevyak commented on TS-1405: -- There are only very limited reasons to use an ink_release_assert, in particular if it looks like we could be returning the wrong content to a user. We shouldn't use them to check other invariants as such checks just slow down the production server and are better done during regression testing and not at production time. Moreover, a server that crashes can cause major service disruption, so the assert itself may very well cause more harm than a bug. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608576#comment-13608576 ] Leif Hedstrom commented on TS-1405: --- One request, can we get the master into a state where it actually doesn't segfault, before we commit this ? Right now, it's impossible to run regressions, or performance tests, and on stuff that is this involved, I'd like to do both. Cheer, -- Leif apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606671#comment-13606671 ] John Plevyak commented on TS-1405: -- You are using EVENT_FREE, which does not free the mutex (which is reference counted) by setting it to NULL. Try using free_event(). Also, I think process_cancel_event shouldn't delay for 4 seconds, that is far too long. Perhaps 10 msec? Finally, why is the ink_atomic_popall happening at the end of process cancel event? shouldn't event_cancel_list_head be local and the call happen at the start (after the delay)? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604324#comment-13604324 ] John Plevyak commented on TS-1405: -- Could you update this patch to be against the current master branch? I am getting a compile failure: UnixEThread.cc: In constructor 'EThread::EThread()': UnixEThread.cc:57:81: error: 'IOCORE_ReadConfigInteger' was not declared in this scope UnixEThread.cc: In constructor 'EThread::EThread(ThreadType, int)': UnixEThread.cc:79:81: error: 'IOCORE_ReadConfigInteger' was not declared in this scope UnixEThread.cc: In constructor 'EThread::EThread(ThreadType, Event*, ink_sem*)': UnixEThread.cc:116:81: error: 'IOCORE_ReadConfigInteger' was not declared in this scope and a patch failure: --- iocore/net/P_UnixNetVConnection.h +++ iocore/net/P_UnixNetVConnection.h @@ -339,7 +339,7 @@ inactivity_timeout_in = 0; #ifdef INACTIVITY_TIMEOUT if (inactivity_timeout) { -inactivity_timeout-cancel_action(this); +inactivity_timeout-cancel_event(this); inactivity_timeout = NULL; } #else @@ -351,7 +351,7 @@ UnixNetVConnection::cancel_active_timeout() { if (active_timeout) { -active_timeout-cancel_action(this); +active_timeout-cancel_event(this); active_timeout = NULL; active_timeout_in = 0; } ~ apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604479#comment-13604479 ] Bin Chen commented on TS-1405: -- this patch based on 3.2. i change to rebase on master. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604486#comment-13604486 ] John Plevyak commented on TS-1405: -- Thanx! apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588493#comment-13588493 ] John Plevyak commented on TS-1405: -- I am getting some compilation errors with tcc 4.7.2 : UnixEThread.cc:159:83: error: no matching function for call to 'ink_atomic_cas(int32_t*, bool, bool)' UnixEThread.cc:159:83: note: candidate is: In file included from ../../lib/ts/libts.h:52:0, from P_EventSystem.h:39, from UnixEThread.cc:30: ../../lib/ts/ink_atomic.h:152:1: note: templateclass T bool ink_atomic_cas(volatile T*, T, T) ../../lib/ts/ink_atomic.h:152:1: note: template argument deduction/substitution failed: UnixEThread.cc:159:83: note: deduced conflicting types for parameter 'T' ('int' and 'bool') Also: UnixEThread.cc: In constructor 'EThread::EThread()': UnixEThread.cc:58:81: error: 'IOCORE_ReadConfigInteger' was not declared in this scope apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588498#comment-13588498 ] John Plevyak commented on TS-1405: -- Instance variables CancelList need to start with a lower case letter and use _ to separate words (like all the other variables in this file). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588500#comment-13588500 ] John Plevyak commented on TS-1405: -- The atomic list is single linked, so you could use SLINK for clink in Event. There are lots of events, so an extra field is worth saving. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588925#comment-13588925 ] Bin Chen commented on TS-1405: -- 1. in EThread::set_event_cancel(), only cancel event in this case: a. have be inserted in priority queue (e-in_the_priority_queue) e-timeout_at now + event_cancel_delay(s) b. localQueue Event will be process soon, so don't set cancel. This will be less cancel handler. so canclled event won`t in race condition. 2. if cancelled flag can only be set while holding the mutex of the Event, the set_event_cancel() will more sigle. 3. if enable define INACITVATE_TIMEOUT, vc-timeout will be used vc. if free_event immediatly, there still a race condition(some vc use timeout, but this event have be freed by process_cancelled_event). so add event_cancel_delay. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588932#comment-13588932 ] Bin Chen commented on TS-1405: -- i will update patch by the other comment. 1. modify set_event_cancel(), remove race handler about set cancelled flag. 2. rename some function and variable apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451549#comment-13451549 ] weijin commented on TS-1405: the v3 patch have race more severe than v2. It may lead to call back the continuation even if we cancelled the event. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451365#comment-13451365 ] weijin commented on TS-1405: I afraid the v2 patch still have race in event cancel when the cancel thread set the cancel flag, but not set the in_the_cancel_queue, the thread own the event do the ProtectedQueue::dequeue_timed. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449950#comment-13449950 ] John Plevyak commented on TS-1405: -- There is a race between the adding into the atomic list in the cancelling thread, getting dequeued in the controlling thread, and the setting of the cancelled flag in the cancelling thread. One solution is to take the mutex lock in the check_ready code as the cancelling thread must be holding that lock over the insert into the atomic list and setting the cancelled flag. Note, you could set the cancelled flag before adding to the atomic list and then just ignore it in process_thread() (and any other place) counting on it getting free'd eventually via the atomic list. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: linux_time_wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449953#comment-13449953 ] John Plevyak commented on TS-1405: -- weijin: I don't know that freeing it as soon as possible is as big a goal as race conditions are a problem :) The current code can take up to 5 seconds to free a cancelled event, so this code is much better in that regard, even if we have to wait for the next time the event loop runs. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: linux_time_wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448862#comment-13448862 ] weijin commented on TS-1405: If the cancel action of an event happened between the event dequeued from the protectQueue and inserted into the PriorityEventQueue, how to free it as soon as possible ? should we check the cancel flag before put it into the PriorityEventQueue? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: linux_time_wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447455#comment-13447455 ] kuotai commented on TS-1405: this because of origin scheduler's accuracy is 5ms(event 5ms will insert to after[0], and processed at next loop). so some event can't process, then enter epoll_wait(sleep). The new patch change to 5ms alse. the new patch test: {code} orig: [root@test58 ~]# ab -n 50 -c 50 -k -H Host: ts.cn http://115.238.23.222:8080/1024/1.bmp This is ApacheBench, Version 2.3 $Revision: 655654 $ Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 115.238.23.222 (be patient) Completed 5 requests Completed 10 requests Completed 15 requests Completed 20 requests Completed 25 requests Completed 30 requests Completed 35 requests Completed 40 requests Completed 45 requests Completed 50 requests Finished 50 requests Server Software:ATS/3.2.0 Server Hostname:115.238.23.222 Server Port:8080 Document Path: /1024/1.bmp Document Length:1024 bytes Concurrency Level: 50 Time taken for tests: 34.522 seconds Complete requests: 50 Failed requests:0 Write errors: 0 Keep-Alive requests:50 Total transferred: 69150 bytes HTML transferred: 51200 bytes Requests per second:14483.42 [#/sec] (mean) Time per request: 3.452 [ms] (mean) Time per request: 0.069 [ms] (mean, across all concurrent requests) Transfer rate: 19561.10 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:00 0.0 0 1 Processing: 03 10.8 1 316 Waiting:03 10.8 1 285 Total: 03 10.8 1 316 Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 3 95% 20 98% 41 99% 52 100%316 (longest request) time_wheel: [root@test58 ~]# ab -n 50 -c 50 -k -H Host: ts.cn http://115.238.23.222:8080/1024/1.bmp This is ApacheBench, Version 2.3 $Revision: 655654 $ Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 115.238.23.222 (be patient) Completed 5 requests Completed 10 requests Completed 15 requests Completed 20 requests Completed 25 requests Completed 30 requests Completed 35 requests Completed 40 requests Completed 45 requests Completed 50 requests Finished 50 requests Server Software:ATS/3.2.0 Server Hostname:115.238.23.222 Server Port:8080 Document Path: /1024/1.bmp Document Length:1024 bytes Concurrency Level: 50 Time taken for tests: 35.486 seconds Complete requests: 50 Failed requests:0 Write errors: 0 Keep-Alive requests:50 Total transferred: 69150 bytes HTML transferred: 51200 bytes Requests per second:14090.22 [#/sec] (mean) Time per request: 3.549 [ms] (mean) Time per request: 0.071 [ms] (mean, across all concurrent requests) Transfer rate: 19030.05 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:00 29.1 03000 Processing: 03 10.2 1 263 Waiting:03 10.2 1 263 Total: 04 31.4 13262 Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 2 95% 20 98% 40 99% 51 100% 3262 (longest request) {code} apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: time-wheel.patch, time_wheel_v2.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447456#comment-13447456 ] kuotai commented on TS-1405: {code} ab -n 50 -c 1000 -k -H Host: ts.cn http://115.238.23.222:8080/1024/1.bmp orig: [root@test58 ~]# ab -n 50 -c 1000 -k -H Host: ts.cn http://115.238.23.222:8080/1024/1.bmp This is ApacheBench, Version 2.3 $Revision: 655654 $ Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 115.238.23.222 (be patient) Completed 5 requests Completed 10 requests Completed 15 requests Completed 20 requests Completed 25 requests Completed 30 requests Completed 35 requests Completed 40 requests Completed 45 requests Completed 50 requests Finished 50 requests Server Software:ATS/3.2.0 Server Hostname:115.238.23.222 Server Port:8080 Document Path: /1024/1.bmp Document Length:1024 bytes Concurrency Level: 1000 Time taken for tests: 41.269 seconds Complete requests: 50 Failed requests:0 Write errors: 0 Keep-Alive requests:50 Total transferred: 691506050 bytes HTML transferred: 512001024 bytes Requests per second:12115.56 [#/sec] (mean) Time per request: 82.538 [ms] (mean) Time per request: 0.083 [ms] (mean, across all concurrent requests) Transfer rate: 16363.25 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:00 30.6 03002 Processing: 0 82 83.8 563105 Waiting:0 81 82.7 563105 Total: 0 82 89.4 563676 Percentage of the requests served within a certain time (ms) 50% 56 66% 87 75%112 80%131 90%193 95%253 98%328 99%383 100% 3676 (longest request) time_wheel: [root@test58 ~]# ab -n 50 -c 1000 -k -H Host: ts.cn http://115.238.23.222:8080/1024/1.bmp This is ApacheBench, Version 2.3 $Revision: 655654 $ Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 115.238.23.222 (be patient) Completed 5 requests Completed 10 requests Completed 15 requests Completed 20 requests Completed 25 requests Completed 30 requests Completed 35 requests Completed 40 requests Completed 45 requests Completed 50 requests Finished 50 requests Server Software:ATS/3.2.0 Server Hostname:115.238.23.222 Server Port:8080 Document Path: /1024/1.bmp Document Length:1024 bytes Concurrency Level: 1000 Time taken for tests: 35.423 seconds Complete requests: 50 Failed requests:0 Write errors: 0 Keep-Alive requests:50 Total transferred: 691504308 bytes HTML transferred: 51200 bytes Requests per second:14115.08 [#/sec] (mean) Time per request: 70.846 [ms] (mean) Time per request: 0.071 [ms] (mean, across all concurrent requests) Transfer rate: 19063.74 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:00 30.0 03002 Processing: 0 70 69.6 513033 Waiting:0 70 68.7 513033 Total: 0 71 76.2 513346 Percentage of the requests served within a certain time (ms) 50% 51 66% 76 75% 96 80%110 90%158 95%210 98%276 99%326 100% 3346 (longest request) {code} apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: time-wheel.patch, time_wheel_v2.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443216#comment-13443216 ] Leif Hedstrom commented on TS-1405: --- I'm wondering, with these improvements (they are improvements, right? :) ), could we get rid of inactivity cop, and enable the old code path which injected inactivity events ? I believe the inactivity cop was added as a response to performance concerns with the events, but right now inactivity cop can itself be a serious performance problem. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.0 Attachments: time-wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443575#comment-13443575 ] John Plevyak commented on TS-1405: -- The current code should have a complexity which is bounded by the need to scan the entire queue every 5 seconds. This is necessary because cancelling an event involves setting the volatile cancelled flag and to not scan them would result in running out of memory. Assuming an event is inserted with a 30 seconds timeout and waits till it runs, it will be touched 30/5 = 6 + 10 = 16 times. For a 300 second timeout it will be touched 300/5 = 60 + 10 = 70 times. If an event is cancelled (the normal case for timeouts). Then it will be touched once (after an average of 2.5 seconds). So (at least according to the design). The cost of the current design should be only a small constant factor worse than the time wheel and should average slightly more than 1 touch per event which is the best that can be expected. Of course that is the design if it is causing problems, then likely there is a bug or something about the workload which is causing problems. The time wheel can bring this down to 1 touch every N seconds with expected 1 touch per event or 6 and 60 above. So, I think this is a very reasonable change, assuming that it can deal with the out-of-memory issue, and I interested in seeing the benchmarks as I am curious as to see how the theory and practice collide. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.0 Attachments: time-wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443583#comment-13443583 ] John Plevyak commented on TS-1405: -- Sorry, the numbers for 30 seconds should be 30/5 + ~17 (every time a power of 2 bucket is touched, 1/2 of the of the elements will be moved out, and 1/2 of those will be moved down 2 levels, etc.) = 27 vs 7 for the time wheel So the time wheel, in the case of short expired timeouts, can be several times more efficient. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.0 Attachments: time-wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443716#comment-13443716 ] kuotai commented on TS-1405: Thanks your comments:-) yeah, we will take more tests. In my env(cluster mode), ts have 15K+ qps, and 20W+ event in scheduler. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.0 Attachments: time-wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443773#comment-13443773 ] Leif Hedstrom commented on TS-1405: --- Hmmm, I need to play with this some more, but with few connections (300), the time wheel patch has very noticeable performance degredation. Just doing a quick test (I will fiddle with it some more), I get: {code} http_load -parallel 100 -seconds 20 -keep_alive 100 /tmp/URL 2644059 fetches on 26310 conns, 300 max parallel, 2.644059E+06 bytes, in 20 seconds 1 mean bytes/fetch 132202.7 fetches/sec, 1.322027E+05 bytes/sec msecs/connect: 0.156 mean, 1.884 max, 0.048 min msecs/first-response: 2.156 mean, 82.044 max, 0.076 min tinkerballa (21:15) 272/0 $ ~/benchit.sh 100 20 100 http_load -parallel 100 -seconds 20 -keep_alive 100 /tmp/URL 3275553 fetches on 32567 conns, 300 max parallel, 3.275550E+06 bytes, in 20 seconds 1 mean bytes/fetch 163776.5 fetches/sec, 1.637765E+05 bytes/sec msecs/connect: 0.171 mean, 2.251 max, 0.047 min msecs/first-response: 1.440 mean, 117.784 max, 0.090 min {code} The first is with the time wheel patch, the second is basic trunk (which is still a little slower than I normally would see it, need to look into that too). But both throughput (QPS) and latency is worse with the patch. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.0 Attachments: time-wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443774#comment-13443774 ] Leif Hedstrom commented on TS-1405: --- I should point out that CPU usage is less with the time wheel patch. So, perhaps there's a lock contention or something that triggers now, preventing us from consuming all available CPU ? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.0 Attachments: time-wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira