[jira] [Commented] (CASSANDRA-7761) Upgrade netty and enable epoll event loop

2014-10-09 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165763#comment-14165763
 ] 

T Jake Luciani commented on CASSANDRA-7761:
---

Also added SO_LINGER = 0 in 1bae25ace7cd25487aa5cff83447069cd80d3e42

Was seeing lots of sockets in TIMED_WAIT between consecutive stress runs

 Upgrade netty and enable epoll event loop
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-08 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163883#comment-14163883
 ] 

Jason Brown commented on CASSANDRA-7761:


I ran the latest branch on my cluster, and the results were quite good. For 
writes, throughput was about the same, but latencies were ~10% better on the 
branch. For reads, throughput was about 10% better and latencies 15-20% better 
on the branch.

+1 for the current branch (with just epoll changes)



 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-07 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162517#comment-14162517
 ] 

T Jake Luciani commented on CASSANDRA-7761:
---

I've opened up CASSANDRA-8075 to start looking into making the pipeline fully 
async.

Here is a patch for simply upgrading netty and using epoll under linux.  
https://github.com/tjake/cassandra/tree/netty-update

The benchmark results show a tiny improvement under epoll. 

http://cstar.datastax.com/graph?stats=2c82beac-4e5c-11e4-8caa-bc764e04482cmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=405.68ymin=0ymax=62528.4

 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 3.0


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-02 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156443#comment-14156443
 ] 

Sylvain Lebresne commented on CASSANDRA-7761:
-

bq. The primary change is to avoid using a separate thread pool for the 
dispatch step and re-use the nio threads.

I don't think we want to do that, not until we have CASSANDRA-5239 (or 
something approaching). Currently the dispatch is really a blocking step that 
execute the query and only return when it's complete. If you use NIO threads, 
you can't have more in-flights query going on than you have NIO threads, and 
you don't have whole lot of those if my knowledge of Netty isn't too outdated.  

 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-01 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155426#comment-14155426
 ] 

T Jake Luciani commented on CASSANDRA-7761:
---

I've done some work to further optimize our netty server and gotten some 
significant gains.
The primary change is to avoid using a separate thread pool for the dispatch 
step and re-use the nio threads.  This cuts a large amount of latency from the 
request as you can see from the runs below (4,8,16 threads, later C* becomes 
the bottleneck).  It also cuts a large amount of cpu and thread switching.  
Also switched on epoll.

Sharing the thread pool also let's netty naturally batch responses, so we no 
longer need CASSANDRA-5663

I need to do further testing on our cstar clusters but initial results look 
promising and the code is cleaner.

{code}
--Current trunk--

./bin/cassandra -f  152.08s user 25.14s system 173% cpu 1:42.06 total

Results:
op rate   : 45473
partition rate: 45473
row rate  : 45473
latency mean  : 20.2
latency median: 16.9
latency 95th percentile   : 48.4
latency 99th percentile   : 75.2
latency 99.9th percentile : 116.4
latency max   : 352.5
total gc count: 39
total gc mb   : 12477
total gc time (s) : 2
avg gc time(ms)   : 43
stdev gc time(ms) : 10
Total operation time  : 00:00:33
Improvement over 609 threadCount: 3%
 id, total ops , adj row/s,op/s,pk/s,   row/s,mean, 
med, .95, .99,.999, max,   time,   stderr,  gc: #,  max ms,  
sum ms,  sdv ms,  mb
  4 threadCount, 372704,-0,   12150,   12150,   12150, 0.3, 
0.3, 0.5, 0.9, 4.0,44.7,   30.7,  0.01078, 20, 625, 
625,   5,6352
  8 threadCount, 566480, 18307,   18289,   18289,   18289, 0.4, 
0.4, 0.7, 1.3, 5.6,56.9,   31.0,  0.01124, 23, 781, 
781,   6,7320
 16 threadCount, 771731, 24763,   24739,   24739,   24739, 0.6, 
0.5, 1.2, 2.5,18.3,66.5,   31.2,  0.01758, 25, 885, 
885,   7,7980
 24 threadCount, 916588, 29341,   29312,   29312,   29312, 0.8, 
0.6, 1.6, 3.7,12.3,52.5,   31.3,  0.01256, 26, 899, 
899,   6,8308
 36 threadCount, 1039678   , 33081,   33068,   33068,   33068, 1.1, 
0.8, 2.2, 5.9,23.5,59.2,   31.4,  0.00985, 29, 986, 
986,   7,9271
 54 threadCount, 1123610   , 35823,   35780,   35780,   35780, 1.5, 
1.1, 3.2, 8.9,36.1,83.6,   31.4,  0.02015, 30,1104,
1169,  11,9581
 81 threadCount, 1185809   ,-0,   37260,   37260,   37260, 2.2, 
1.6, 4.7,13.0,44.0,   300.7,   31.8,  0.01640, 32,1074,
1169,   8,   10235
121 threadCount, 1275470   ,-0,   40124,   40124,   40124, 3.0, 
2.4, 6.3,14.7,43.7,71.0,   31.8,  0.01488, 33,1053,
1152,   7,   10556
181 threadCount, 1326379   , 41472,   41413,   41413,   41413, 4.4, 
3.5, 9.3,22.1,51.4,84.0,   32.0,  0.01061, 34,1116,
1277,   7,   10876
271 threadCount, 1340955   , 41138,   41060,   41060,   41060, 6.6, 
4.9,17.8,36.3,63.2,   125.8,   32.7,  0.01902, 35,1234,
1375,  11,   11191
406 threadCount, 1418529   , 43033,   42978,   42978,   42978, 9.4, 
7.3,24.3,44.2,81.3,   299.8,   33.0,  0.01606, 36,1172,
1432,   9,   11517
609 threadCount, 1465946   , 44234,   44159,   44159,   44159,13.8,
11.6,33.5,53.3,94.5,   135.8,   33.2,  0.01105, 37,1183,
1428,   9,   11837
913 threadCount, 1544584   , 45547,   45473,   45473,   45473,20.2,
16.9,48.4,75.2,   116.4,   352.5,   34.0,  0.00953, 39,1324,
1663,  10,   12477
END





--Netty fixes--
./bin/cassandra -f  110.27s user 13.83s system 116% cpu 1:46.80 total

Results:
op rate   : 45506
partition rate: 45506
row rate  : 45506
latency mean  : 20.1
latency median: 17.3
latency 95th percentile   : 40.6
latency 99th percentile   : 69.4
latency 99.9th percentile : 148.4
latency max   : 261.7
total gc count: 38
total gc mb   : 12154
total gc time (s) : 2
avg gc time(ms)   : 40
stdev gc time(ms) : 10
Total operation time  : 00:00:33
Improvement over 609 threadCount: 2%
 id, total ops , adj row/s,op/s,pk/s,   row/s,mean, 
med, .95, .99,.999, max,   time,   stderr,  gc: #,  max ms,  
sum ms,  sdv ms,  mb
  4 threadCount, 549137,-0,   

[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-01 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155508#comment-14155508
 ] 

Jason Brown commented on CASSANDRA-7761:


[~tjake] Did you have a chance to try out the native epoll implementation? 
Granted, it's linux-only, but [~norman] seems to be pretty excited about it.

 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-01 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155521#comment-14155521
 ] 

T Jake Luciani commented on CASSANDRA-7761:
---

[~jasobrown] Yes this uses it.

 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-01 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155874#comment-14155874
 ] 

Ryan McGuire commented on CASSANDRA-7761:
-

I didn't spend anytime tweaking this, but the first pass doesn't look good:

http://cstar.datastax.com/tests/id/190386be-49b5-11e4-a9ac-bc764e04482c


 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-01 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156012#comment-14156012
 ] 

T Jake Luciani commented on CASSANDRA-7761:
---

So I think using the IO threads is causing the pipeline to stall in this 
test... Not sure why it's not happening locally

 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-01 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156035#comment-14156035
 ] 

Ryan McGuire commented on CASSANDRA-7761:
-

[~tjake] do you mean it works for you because stress is from the same machine 
as c*? I haven't tried that yet. Herr is the result for one node (but still 
stress is on a different machine than that node): 

http://cstar.datastax.com/tests/id/ae34ba9e-49da-11e4-9709-bc764e04482c

 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7761) Upgrade netty

2014-10-01 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156071#comment-14156071
 ] 

Jason Brown commented on CASSANDRA-7761:


I ran this branch on my three node cluster, and unfortunately the result was 
pretty bad. I ran the trunk code (the commit previous tjake's changes) - which 
entails running the insert for one round, then the reads for three rounds. I 
used the incrementing thread count version of stress, starting with 4 
threads, then 8, and all the way up to 913 threads. The branch code, however, 
failed to get past 181 threads on the insert stage, and the subsequent read was 
rather bad (probably because not enough data was inserted). Even at the 181 
thread mark, the throughput was half that of the trunk, and latencies were 
greater than double.

That being said, I reviewed the code and I like where it's going, just need to 
work out the performance kinks.

 Upgrade netty
 -

 Key: CASSANDRA-7761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.1


 Latest netty contains the proper fix for CASSANDRA-7695 plus some of the 
 performance patches  [~benedict] contributed.  We should upgrade to this 
 following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)