[jira] [Updated] (CASSANDRA-8463) Constant compaction under LCS
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Deng updated CASSANDRA-8463: Labels: lcs (was: ) > Constant compaction under LCS > - > > Key: CASSANDRA-8463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), > 144G RAM, solid-state storage. > Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. > Heap is 32G total, 4G newsize. > 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 > memtable_cleanup_threshold > concurrent_compactors: 20 >Reporter: Rick Branson >Assignee: Marcus Eriksson > Labels: lcs > Fix For: 2.1.3 > > Attachments: 0001-better-logging.patch, > 0001-make-sure-we-set-lastCompactedKey-properly.patch, log-for-8463.txt > > > It appears that tables configured with LCS will completely re-compact > themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 > -> 2.1.2, specifically). It starts out with <10 pending tasks for an hour or > so, then starts building up, now with 50-100 tasks pending across the cluster > after 12 hours. These nodes are under heavy write load, but were easily able > to keep up in 2.0 (they rarely had >5 pending compaction tasks), so I don't > think it's LCS in 2.1 actually being worse, just perhaps some different LCS > behavior that causes the layout of tables from 2.0 to prompt the compactor to > reorganize them? > The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB > SSTables due to the improved memtable setup in 2.1. Before I upgraded the > entire cluster to 2.1, I noticed the problem and tried several variations on > the flush size, thinking perhaps the larger tables in L0 were causing some > kind of cascading compactions. Even if they're sized roughly like the 2.0 > flushes were, same behavior occurs. I also tried both enabling & disabling > STCS in L0 with no real change other than L0 began to back up faster, so I > left the STCS in L0 enabled. > Tables are configured with 32MB sstable_size_in_mb, which was found to be an > improvement on the 160MB table size for compaction performance. Maybe this is > wrong now? Otherwise, the tables are configured with defaults. Compaction has > been unthrottled to help them catch-up. The compaction threads stay very > busy, with the cluster-wide CPU at 45% "nice" time. No nodes have completely > caught up yet. I'll update JIRA with status about their progress if anything > interesting happens. > From a node around 12 hours ago, around an hour after the upgrade, with 19 > pending compaction tasks: > SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] > SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] > SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] > SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] > SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] > SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] > SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] > SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] > Recently, with 41 pending compaction tasks: > SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] > SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] > SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] > SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] > SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] > SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] > SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] > SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] > More information about the use case: writes are roughly uniform across these > tables. The data is "sharded" across these 8 tables by key to improve > compaction parallelism. Each node receives up to 75,000 writes/sec sustained > at peak, and a small number of reads. This is a pre-production cluster that's > being warmed up with new data, so the low volume of reads (~100/sec per node) > is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8463) Constant compaction under LCS
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8463: --- Component/s: Compaction > Constant compaction under LCS > - > > Key: CASSANDRA-8463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), > 144G RAM, solid-state storage. > Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. > Heap is 32G total, 4G newsize. > 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 > memtable_cleanup_threshold > concurrent_compactors: 20 >Reporter: Rick Branson >Assignee: Marcus Eriksson > Fix For: 2.1.3 > > Attachments: 0001-better-logging.patch, > 0001-make-sure-we-set-lastCompactedKey-properly.patch, log-for-8463.txt > > > It appears that tables configured with LCS will completely re-compact > themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 > -> 2.1.2, specifically). It starts out with <10 pending tasks for an hour or > so, then starts building up, now with 50-100 tasks pending across the cluster > after 12 hours. These nodes are under heavy write load, but were easily able > to keep up in 2.0 (they rarely had >5 pending compaction tasks), so I don't > think it's LCS in 2.1 actually being worse, just perhaps some different LCS > behavior that causes the layout of tables from 2.0 to prompt the compactor to > reorganize them? > The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB > SSTables due to the improved memtable setup in 2.1. Before I upgraded the > entire cluster to 2.1, I noticed the problem and tried several variations on > the flush size, thinking perhaps the larger tables in L0 were causing some > kind of cascading compactions. Even if they're sized roughly like the 2.0 > flushes were, same behavior occurs. I also tried both enabling & disabling > STCS in L0 with no real change other than L0 began to back up faster, so I > left the STCS in L0 enabled. > Tables are configured with 32MB sstable_size_in_mb, which was found to be an > improvement on the 160MB table size for compaction performance. Maybe this is > wrong now? Otherwise, the tables are configured with defaults. Compaction has > been unthrottled to help them catch-up. The compaction threads stay very > busy, with the cluster-wide CPU at 45% "nice" time. No nodes have completely > caught up yet. I'll update JIRA with status about their progress if anything > interesting happens. > From a node around 12 hours ago, around an hour after the upgrade, with 19 > pending compaction tasks: > SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] > SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] > SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] > SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] > SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] > SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] > SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] > SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] > Recently, with 41 pending compaction tasks: > SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] > SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] > SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] > SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] > SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] > SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] > SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] > SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] > More information about the use case: writes are roughly uniform across these > tables. The data is "sharded" across these 8 tables by key to improve > compaction parallelism. Each node receives up to 75,000 writes/sec sustained > at peak, and a small number of reads. This is a pre-production cluster that's > being warmed up with new data, so the low volume of reads (~100/sec per node) > is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8463) Constant compaction under LCS
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8463: --- Reviewer: Yuki Morishita could you review [~yukim]? (unless you already checked the code as well [~rbranson]?) Constant compaction under LCS - Key: CASSANDRA-8463 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 Project: Cassandra Issue Type: Bug Components: Core Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 144G RAM, solid-state storage. Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. Heap is 32G total, 4G newsize. 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 memtable_cleanup_threshold concurrent_compactors: 20 Reporter: Rick Branson Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: 0001-better-logging.patch, 0001-make-sure-we-set-lastCompactedKey-properly.patch, log-for-8463.txt It appears that tables configured with LCS will completely re-compact themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 - 2.1.2, specifically). It starts out with 10 pending tasks for an hour or so, then starts building up, now with 50-100 tasks pending across the cluster after 12 hours. These nodes are under heavy write load, but were easily able to keep up in 2.0 (they rarely had 5 pending compaction tasks), so I don't think it's LCS in 2.1 actually being worse, just perhaps some different LCS behavior that causes the layout of tables from 2.0 to prompt the compactor to reorganize them? The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB SSTables due to the improved memtable setup in 2.1. Before I upgraded the entire cluster to 2.1, I noticed the problem and tried several variations on the flush size, thinking perhaps the larger tables in L0 were causing some kind of cascading compactions. Even if they're sized roughly like the 2.0 flushes were, same behavior occurs. I also tried both enabling disabling STCS in L0 with no real change other than L0 began to back up faster, so I left the STCS in L0 enabled. Tables are configured with 32MB sstable_size_in_mb, which was found to be an improvement on the 160MB table size for compaction performance. Maybe this is wrong now? Otherwise, the tables are configured with defaults. Compaction has been unthrottled to help them catch-up. The compaction threads stay very busy, with the cluster-wide CPU at 45% nice time. No nodes have completely caught up yet. I'll update JIRA with status about their progress if anything interesting happens. From a node around 12 hours ago, around an hour after the upgrade, with 19 pending compaction tasks: SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] Recently, with 41 pending compaction tasks: SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] More information about the use case: writes are roughly uniform across these tables. The data is sharded across these 8 tables by key to improve compaction parallelism. Each node receives up to 75,000 writes/sec sustained at peak, and a small number of reads. This is a pre-production cluster that's being warmed up with new data, so the low volume of reads (~100/sec per node) is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8463) Constant compaction under LCS
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8463: --- Attachment: 0001-make-sure-we-set-lastCompactedKey-properly.patch Seems we don't set lastCompactedKeys properly, meaning we always start from the sstable with the smallest key when trying to find compaction candidates. This could explain the weird compaction candidate picking above (the leveling becomes very unbalanced) Could you try it out [~rbranson]? Constant compaction under LCS - Key: CASSANDRA-8463 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 Project: Cassandra Issue Type: Bug Components: Core Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 144G RAM, solid-state storage. Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. Heap is 32G total, 4G newsize. 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 memtable_cleanup_threshold concurrent_compactors: 20 Reporter: Rick Branson Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: 0001-better-logging.patch, 0001-make-sure-we-set-lastCompactedKey-properly.patch, log-for-8463.txt It appears that tables configured with LCS will completely re-compact themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 - 2.1.2, specifically). It starts out with 10 pending tasks for an hour or so, then starts building up, now with 50-100 tasks pending across the cluster after 12 hours. These nodes are under heavy write load, but were easily able to keep up in 2.0 (they rarely had 5 pending compaction tasks), so I don't think it's LCS in 2.1 actually being worse, just perhaps some different LCS behavior that causes the layout of tables from 2.0 to prompt the compactor to reorganize them? The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB SSTables due to the improved memtable setup in 2.1. Before I upgraded the entire cluster to 2.1, I noticed the problem and tried several variations on the flush size, thinking perhaps the larger tables in L0 were causing some kind of cascading compactions. Even if they're sized roughly like the 2.0 flushes were, same behavior occurs. I also tried both enabling disabling STCS in L0 with no real change other than L0 began to back up faster, so I left the STCS in L0 enabled. Tables are configured with 32MB sstable_size_in_mb, which was found to be an improvement on the 160MB table size for compaction performance. Maybe this is wrong now? Otherwise, the tables are configured with defaults. Compaction has been unthrottled to help them catch-up. The compaction threads stay very busy, with the cluster-wide CPU at 45% nice time. No nodes have completely caught up yet. I'll update JIRA with status about their progress if anything interesting happens. From a node around 12 hours ago, around an hour after the upgrade, with 19 pending compaction tasks: SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] Recently, with 41 pending compaction tasks: SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] More information about the use case: writes are roughly uniform across these tables. The data is sharded across these 8 tables by key to improve compaction parallelism. Each node receives up to 75,000 writes/sec sustained at peak, and a small number of reads. This is a pre-production cluster that's being warmed up with new data, so the low volume of reads (~100/sec per node) is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8463) Constant compaction under LCS
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Branson updated CASSANDRA-8463: Summary: Constant compaction under LCS (was: Upgrading 2.0 to 2.1 causes LCS to recompact all files) Constant compaction under LCS - Key: CASSANDRA-8463 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 Project: Cassandra Issue Type: Bug Components: Core Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 144G RAM, solid-state storage. Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. Heap is 32G total, 4G newsize. 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 memtable_cleanup_threshold concurrent_compactors: 20 Reporter: Rick Branson Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: 0001-better-logging.patch, log-for-8463.txt It appears that tables configured with LCS will completely re-compact themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 - 2.1.2, specifically). It starts out with 10 pending tasks for an hour or so, then starts building up, now with 50-100 tasks pending across the cluster after 12 hours. These nodes are under heavy write load, but were easily able to keep up in 2.0 (they rarely had 5 pending compaction tasks), so I don't think it's LCS in 2.1 actually being worse, just perhaps some different LCS behavior that causes the layout of tables from 2.0 to prompt the compactor to reorganize them? The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB SSTables due to the improved memtable setup in 2.1. Before I upgraded the entire cluster to 2.1, I noticed the problem and tried several variations on the flush size, thinking perhaps the larger tables in L0 were causing some kind of cascading compactions. Even if they're sized roughly like the 2.0 flushes were, same behavior occurs. I also tried both enabling disabling STCS in L0 with no real change other than L0 began to back up faster, so I left the STCS in L0 enabled. Tables are configured with 32MB sstable_size_in_mb, which was found to be an improvement on the 160MB table size for compaction performance. Maybe this is wrong now? Otherwise, the tables are configured with defaults. Compaction has been unthrottled to help them catch-up. The compaction threads stay very busy, with the cluster-wide CPU at 45% nice time. No nodes have completely caught up yet. I'll update JIRA with status about their progress if anything interesting happens. From a node around 12 hours ago, around an hour after the upgrade, with 19 pending compaction tasks: SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] Recently, with 41 pending compaction tasks: SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] More information about the use case: writes are roughly uniform across these tables. The data is sharded across these 8 tables by key to improve compaction parallelism. Each node receives up to 75,000 writes/sec sustained at peak, and a small number of reads. This is a pre-production cluster that's being warmed up with new data, so the low volume of reads (~100/sec per node) is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)