[jira] [Commented] (CASSANDRA-7761) Upgrade netty and enable epoll event loop
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165763#comment-14165763 ] T Jake Luciani commented on CASSANDRA-7761: --- Also added SO_LINGER = 0 in 1bae25ace7cd25487aa5cff83447069cd80d3e42 Was seeing lots of sockets in TIMED_WAIT between consecutive stress runs Upgrade netty and enable epoll event loop - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163883#comment-14163883 ] Jason Brown commented on CASSANDRA-7761: I ran the latest branch on my cluster, and the results were quite good. For writes, throughput was about the same, but latencies were ~10% better on the branch. For reads, throughput was about 10% better and latencies 15-20% better on the branch. +1 for the current branch (with just epoll changes) Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162517#comment-14162517 ] T Jake Luciani commented on CASSANDRA-7761: --- I've opened up CASSANDRA-8075 to start looking into making the pipeline fully async. Here is a patch for simply upgrading netty and using epoll under linux. https://github.com/tjake/cassandra/tree/netty-update The benchmark results show a tiny improvement under epoll. http://cstar.datastax.com/graph?stats=2c82beac-4e5c-11e4-8caa-bc764e04482cmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=405.68ymin=0ymax=62528.4 Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 3.0 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156443#comment-14156443 ] Sylvain Lebresne commented on CASSANDRA-7761: - bq. The primary change is to avoid using a separate thread pool for the dispatch step and re-use the nio threads. I don't think we want to do that, not until we have CASSANDRA-5239 (or something approaching). Currently the dispatch is really a blocking step that execute the query and only return when it's complete. If you use NIO threads, you can't have more in-flights query going on than you have NIO threads, and you don't have whole lot of those if my knowledge of Netty isn't too outdated. Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155426#comment-14155426 ] T Jake Luciani commented on CASSANDRA-7761: --- I've done some work to further optimize our netty server and gotten some significant gains. The primary change is to avoid using a separate thread pool for the dispatch step and re-use the nio threads. This cuts a large amount of latency from the request as you can see from the runs below (4,8,16 threads, later C* becomes the bottleneck). It also cuts a large amount of cpu and thread switching. Also switched on epoll. Sharing the thread pool also let's netty naturally batch responses, so we no longer need CASSANDRA-5663 I need to do further testing on our cstar clusters but initial results look promising and the code is cleaner. {code} --Current trunk-- ./bin/cassandra -f 152.08s user 25.14s system 173% cpu 1:42.06 total Results: op rate : 45473 partition rate: 45473 row rate : 45473 latency mean : 20.2 latency median: 16.9 latency 95th percentile : 48.4 latency 99th percentile : 75.2 latency 99.9th percentile : 116.4 latency max : 352.5 total gc count: 39 total gc mb : 12477 total gc time (s) : 2 avg gc time(ms) : 43 stdev gc time(ms) : 10 Total operation time : 00:00:33 Improvement over 609 threadCount: 3% id, total ops , adj row/s,op/s,pk/s, row/s,mean, med, .95, .99,.999, max, time, stderr, gc: #, max ms, sum ms, sdv ms, mb 4 threadCount, 372704,-0, 12150, 12150, 12150, 0.3, 0.3, 0.5, 0.9, 4.0,44.7, 30.7, 0.01078, 20, 625, 625, 5,6352 8 threadCount, 566480, 18307, 18289, 18289, 18289, 0.4, 0.4, 0.7, 1.3, 5.6,56.9, 31.0, 0.01124, 23, 781, 781, 6,7320 16 threadCount, 771731, 24763, 24739, 24739, 24739, 0.6, 0.5, 1.2, 2.5,18.3,66.5, 31.2, 0.01758, 25, 885, 885, 7,7980 24 threadCount, 916588, 29341, 29312, 29312, 29312, 0.8, 0.6, 1.6, 3.7,12.3,52.5, 31.3, 0.01256, 26, 899, 899, 6,8308 36 threadCount, 1039678 , 33081, 33068, 33068, 33068, 1.1, 0.8, 2.2, 5.9,23.5,59.2, 31.4, 0.00985, 29, 986, 986, 7,9271 54 threadCount, 1123610 , 35823, 35780, 35780, 35780, 1.5, 1.1, 3.2, 8.9,36.1,83.6, 31.4, 0.02015, 30,1104, 1169, 11,9581 81 threadCount, 1185809 ,-0, 37260, 37260, 37260, 2.2, 1.6, 4.7,13.0,44.0, 300.7, 31.8, 0.01640, 32,1074, 1169, 8, 10235 121 threadCount, 1275470 ,-0, 40124, 40124, 40124, 3.0, 2.4, 6.3,14.7,43.7,71.0, 31.8, 0.01488, 33,1053, 1152, 7, 10556 181 threadCount, 1326379 , 41472, 41413, 41413, 41413, 4.4, 3.5, 9.3,22.1,51.4,84.0, 32.0, 0.01061, 34,1116, 1277, 7, 10876 271 threadCount, 1340955 , 41138, 41060, 41060, 41060, 6.6, 4.9,17.8,36.3,63.2, 125.8, 32.7, 0.01902, 35,1234, 1375, 11, 11191 406 threadCount, 1418529 , 43033, 42978, 42978, 42978, 9.4, 7.3,24.3,44.2,81.3, 299.8, 33.0, 0.01606, 36,1172, 1432, 9, 11517 609 threadCount, 1465946 , 44234, 44159, 44159, 44159,13.8, 11.6,33.5,53.3,94.5, 135.8, 33.2, 0.01105, 37,1183, 1428, 9, 11837 913 threadCount, 1544584 , 45547, 45473, 45473, 45473,20.2, 16.9,48.4,75.2, 116.4, 352.5, 34.0, 0.00953, 39,1324, 1663, 10, 12477 END --Netty fixes-- ./bin/cassandra -f 110.27s user 13.83s system 116% cpu 1:46.80 total Results: op rate : 45506 partition rate: 45506 row rate : 45506 latency mean : 20.1 latency median: 17.3 latency 95th percentile : 40.6 latency 99th percentile : 69.4 latency 99.9th percentile : 148.4 latency max : 261.7 total gc count: 38 total gc mb : 12154 total gc time (s) : 2 avg gc time(ms) : 40 stdev gc time(ms) : 10 Total operation time : 00:00:33 Improvement over 609 threadCount: 2% id, total ops , adj row/s,op/s,pk/s, row/s,mean, med, .95, .99,.999, max, time, stderr, gc: #, max ms, sum ms, sdv ms, mb 4 threadCount, 549137,-0,
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155508#comment-14155508 ] Jason Brown commented on CASSANDRA-7761: [~tjake] Did you have a chance to try out the native epoll implementation? Granted, it's linux-only, but [~norman] seems to be pretty excited about it. Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155521#comment-14155521 ] T Jake Luciani commented on CASSANDRA-7761: --- [~jasobrown] Yes this uses it. Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155874#comment-14155874 ] Ryan McGuire commented on CASSANDRA-7761: - I didn't spend anytime tweaking this, but the first pass doesn't look good: http://cstar.datastax.com/tests/id/190386be-49b5-11e4-a9ac-bc764e04482c Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156012#comment-14156012 ] T Jake Luciani commented on CASSANDRA-7761: --- So I think using the IO threads is causing the pipeline to stall in this test... Not sure why it's not happening locally Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156035#comment-14156035 ] Ryan McGuire commented on CASSANDRA-7761: - [~tjake] do you mean it works for you because stress is from the same machine as c*? I haven't tried that yet. Herr is the result for one node (but still stress is on a different machine than that node): http://cstar.datastax.com/tests/id/ae34ba9e-49da-11e4-9709-bc764e04482c Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7761) Upgrade netty
[ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156071#comment-14156071 ] Jason Brown commented on CASSANDRA-7761: I ran this branch on my three node cluster, and unfortunately the result was pretty bad. I ran the trunk code (the commit previous tjake's changes) - which entails running the insert for one round, then the reads for three rounds. I used the incrementing thread count version of stress, starting with 4 threads, then 8, and all the way up to 913 threads. The branch code, however, failed to get past 181 threads on the insert stage, and the subsequent read was rather bad (probably because not enough data was inserted). Even at the 181 thread mark, the throughput was half that of the trunk, and latencies were greater than double. That being said, I reviewed the code and I like where it's going, just need to work out the performance kinks. Upgrade netty - Key: CASSANDRA-7761 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 2.1.1 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance patches [~benedict] contributed. We should upgrade to this following extensive burn in testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)