[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427680#comment-17427680 ] Stefan Miklosovic commented on CASSANDRA-13780: --- Hi [~jjirsa], is this still relevant after all these years? I think this targets 3.0 which we hardly support anymore anyway ... > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait >Priority: Normal > Fix For: 3.0.x > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161844#comment-16161844 ] Kevin Rivait commented on CASSANDRA-13780: -- Jason,we are looking forward to your streaming fix because it will better address our requirements to bootstrap new nodes as quickly as possible. For the bootstrap node use case, assuming the code is efficiently implemented, being CPU bound for streaming is a good thing because it will minimize bootstrap time by maximizing node utilization, since during the bootstrap phase the node is not doing any query processing anyways, right? > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150566#comment-16150566 ] Kevin Rivait commented on CASSANDRA-13780: -- thank you Jeff, regarding TWCS, agree this is a non issue. We went back and dumped the SSTABLE max/min dates and verified that the buckets older than TTL and GCGS are in fact being dropped. regarding adding a DC, that is exactly what we did in our DEV/TEST environment, but we rebuilt nodes one at a time, we weren't sure the consequences (if any) of rebuilding all nodes at the same time. > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144667#comment-16144667 ] Jeff Jirsa commented on CASSANDRA-13780: {quote} 2. compaction on the newly added node - compaction has fallen behind, with anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks for compaction to finish on each newly added node. Increasing number of compaction threads to match number of CPU (40) and increasing compaction throughput to 32MB/s seemed to be the sweet spot. {quote} This is a known limitation of vnodes, and has been since they were introduced in 1.2. It's less awful if you use LCS, but obviously that typically requires quite a bit more IO. I'm not going to link it here, but if you google "real world dtcs for operators", I cover this for time-series use cases in my 2015 cassandra summit talk (which is where I introduced TWCS to solve problems I encountered doing exactly what you're trying to do now. {quote} 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. Compaction correctly placed the data in daily files, but the problem is the file dates reflect when compaction created the file and not the date of the last record written in the TWCS bucket, which will cause the files to remain around much longer than necessary. {quote} TWCS ignores the file dates, it uses the sstable's max timestamp value from the metadata. This should be respected after streaming. {quote} 1. What can be done to substantially improve the performance of adding a new node? {quote} Jason has been working on fixing streaming. Right now it's CPU bound rebuilding sstable components (which is CPU intensive, generates a lot of garbage re-compressing and whatnot). {quote} 2. Can compaction on TWCS partitions for newly added nodes change the file create date to match the highest date record in the file or add another piece of meta-data to the TWCS files that reflect the file drop date so that TWCS partitions can be dropped consistently? {quote} Pretty sure this is a nonissue, because as I mentioned, TWCS doesnt care about the sstable's file creation date, it uses the metadata max timestamp. {quote} Requirement is to double cluster size, capacity, and ingestion volume within a few weeks. {quote} Usually people who need to grow rapidly tend not to use vnodes, so they can bootstrap more than one node at a time simultaneously. It's too late to give you that advice, though, so here's something slightly different: If you can spare the hardware (easier in a cloud environment than physical), you can add a whole new "datacenter" with RF=0, then ALTER KEYSPACE to set RF=3 (or whatever you use), then {{nodetool rebuild}} it in place to stream data to it all at once, then tear down the old one. That requires a bit of extra hardware, but let's you greatly increase cluster size very quickly. > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144649#comment-16144649 ] Kurt Greaves commented on CASSANDRA-13780: -- Unfortunately I don't have time to bury my head in the streaming code. Maybe someone with a better understanding can explain. > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143687#comment-16143687 ] Kevin Rivait commented on CASSANDRA-13780: -- yes, we are using vnodes, num_tokens: 128 on each node when add a fifth node, we see 4 nodes stream to it from the system.log INFO [main] 2017-08-24 14:16:56,071 StorageService.java:1170 - JOINING: Starting to bootstrap... INFO [main] 2017-08-24 14:16:56,187 StreamResultFuture.java:87 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Executing streaming plan for Bootstrap INFO [StreamConnectionEstablisher:1] 2017-08-24 14:16:56,188 StreamSession.java:239 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Starting streaming to /10.126.63.127 INFO [StreamConnectionEstablisher:2] 2017-08-24 14:16:56,188 StreamSession.java:239 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Starting streaming to /10.126.63.124 INFO [StreamConnectionEstablisher:3] 2017-08-24 14:16:56,188 StreamSession.java:239 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Starting streaming to /10.126.63.125 INFO [StreamConnectionEstablisher:4] 2017-08-24 14:16:56,189 StreamSession.java:239 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Starting streaming to /10.126.63.121 INFO [StreamConnectionEstablisher:4] 2017-08-24 14:16:56,196 StreamCoordinator.java:213 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2, ID#0] Beginning stream session with /10.126.63.121 INFO [StreamConnectionEstablisher:3] 2017-08-24 14:16:56,196 StreamCoordinator.java:213 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2, ID#0] Beginning stream session with /10.126.63.125 INFO [StreamConnectionEstablisher:2] 2017-08-24 14:16:56,196 StreamCoordinator.java:213 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2, ID#0] Beginning stream session with /10.126.63.124 INFO [StreamConnectionEstablisher:1] 2017-08-24 14:16:56,196 StreamCoordinator.java:213 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2, ID#0] Beginning stream session with /10.126.63.127 INFO [STREAM-IN-/10.126.63.121] 2017-08-24 14:16:56,245 StreamResultFuture.java:169 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2 ID#0] Prepare completed. Receiving 9 files(1147643092 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/10.126.63.127] 2017-08-24 14:16:56,245 StreamResultFuture.java:169 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2 ID#0] Prepare completed. Receiving 5 files(1354972399 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/10.126.63.125] 2017-08-24 14:16:56,248 StreamResultFuture.java:169 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2 ID#0] Prepare completed. Receiving 9 files(1276409087 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/10.126.63.124] 2017-08-24 14:16:56,249 StreamResultFuture.java:169 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2 ID#0] Prepare completed. Receiving 8 files(1446953252 bytes), sending 0 files(0 bytes) INFO [StreamReceiveTask:1] 2017-08-24 14:22:28,495 StreamResultFuture.java:183 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Session with /10.126.63.121 is complete INFO [StreamReceiveTask:1] 2017-08-24 14:23:09,001 StreamResultFuture.java:183 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Session with /10.126.63.125 is complete INFO [StreamReceiveTask:1] 2017-08-24 14:23:27,289 StreamResultFuture.java:183 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Session with /10.126.63.127 is complete INFO [StreamReceiveTask:1] 2017-08-24 14:23:58,065 StreamResultFuture.java:183 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] Session with /10.126.63.124 is complete INFO [StreamReceiveTask:1] 2017-08-24 14:23:58,068 StreamResultFuture.java:215 - [Stream #69767f90-88f8-11e7-aa33-f929dc1360c2] All sessions completed > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s):
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143315#comment-16143315 ] Kurt Greaves commented on CASSANDRA-13780: -- To clarify, you are using vnodes? When you bootstrap the new node, how many nodes stream to it? > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143223#comment-16143223 ] Jeff Jirsa commented on CASSANDRA-13780: De-prioritizing - Blocker is reserved for bugs that MUST be fixed before a release (ie: corruption bugs). > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141552#comment-16141552 ] Kevin Rivait commented on CASSANDRA-13780: -- netcat tests on our dev environment between 2 of our cassandra nodes shows 9600 Mb/s throughput our test approach create a 4 node cassandra cluster one Keyspace with replication 2 one table load a few million rows, load on each node ~ 5GB add a fifth node this test was repeated 3 times TEST1- with streamingthroughput left with the default settings (200 Mb/s) - we observe 26 Mb/s from the sending nodes TEST2 - throttle streamingthroughput to 10 Mb/s all nodes - we observe 10 Mb/s from the sending nodes TEST3 - set streamingthroughput to 50 Mb/s all nodes - we observe 26 Mb/s from the sending nodes Ideally we would like to utilize 10% of our bandwidth for streaming but cannot even come close to the default 200 Mb/s (seems capped at 26 Mb/s) > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait >Priority: Blocker > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137849#comment-16137849 ] Kurt Greaves commented on CASSANDRA-13780: -- Not really sure why you wouldn't get more than 80MB/s for the cluster unless there was some network bottleneck. Have you benchmarked network throughput between the nodes to confirm it's not something there? > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait >Priority: Blocker > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands,
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136849#comment-16136849 ] Kevin Rivait commented on CASSANDRA-13780: -- thank you Kurt reg 1. On our DEV environment (1 DC) we did set streamthroughput to zero on all nodes including the new node with no observed change in throughput. On our PROD environment ( 2 DCs) we set streamthroughput to zero on all nodes in the DC we were adding the node in. We observed streaming from all nodes in the local DC. We cannot achieve the default 200Mb/s streamthroughput per node Our most recent experimental observation is the total streamthroughput for the whole cluster is ~80Mb/s regardless of the number of nodes in the cluster. As we add more nodes, the throughput of each node drops such that the sum total remains ~ 80Mb/s Are there other parameters or bottlenecks that can reduce streaming throughput of each node? reg 2. You are correct. We went back and dumped the SSTABLE max/min dates and verified that the buckets older than TTL and GCGS are in fact being dropped. thank you Kevin > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait >Priority: Blocker > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136696#comment-16136696 ] Kurt Greaves commented on CASSANDRA-13780: -- 1. This is one of the things vnodes are meant to help with, but they are not great for large clusters for a variety of reasons. One question, did you increase the stream throughput on all the nodes, or just on the joining node? 2. This is already recorded in the SSTable metadata. Once the minimum timestamp as reported by {{sstablemetadata}} is expired + GCGS the SSTable will be dropped. > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait >Priority: Blocker > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was sent by Atlassian