[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104481#comment-13104481 ] Sylvain Lebresne commented on CASSANDRA-3181: - +1 Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Jonathan Ellis Labels: compaction Fix For: 1.0.0 Attachments: 3181-2.txt, 3181.txt Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103821#comment-13103821 ] Sylvain Lebresne commented on CASSANDRA-3181: - What is unclear is what did stop the compactions from happening ? Because Brandon said that restarting didn't helped, but I get from that that compaction stopped before the restart. Basically, if the only thing that happened is after a restart, if you don't do any inserts, no compaction happens, then ok, that's not a big deal. But if it is indeed that Compaction just stops running at some point, then there is something that we need to fix. Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Benjamin Coverston Fix For: 1.0.0 Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103826#comment-13103826 ] Benjamin Coverston commented on CASSANDRA-3181: --- From his repro I was able to get compactions going again with a simple write/flush. From Brandon, he's not sure how compactions stopped. There are only two scenarios where I see compactions stopping like this with the tiered compaction strategy: # the server was restarted # a compaction failed unexpectedly The fact that there were no errors in the log makes #2 unlikely. I'm pretty sure that we're just looking at a problem where a server restarts and there is no more activity triggering a flush. Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Benjamin Coverston Fix For: 1.0.0 Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103949#comment-13103949 ] Sylvain Lebresne commented on CASSANDRA-3181: - bq. . So I think repair was blocking minors repair should be on its own executor now, so it shouldn't block minors. bq. a compaction failed unexpectedly hum, a compaction failing shouldn't stop other compactions. Otherwise this is worth fixing. bq. I'm pretty sure that we're just looking at a problem where a server restarts and there is no more activity triggering a flush If that's the case, then is there really much we want to do ? And even if we want, we should move that to 1.0.1. Just want to make sure this doesn't hide a real, unknown, problem. Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Benjamin Coverston Fix For: 1.0.0 Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103952#comment-13103952 ] Jonathan Ellis commented on CASSANDRA-3181: --- bq. CASSANDRA-2444 got in the way I'm not sure what the right solution is here. I buy the premise of 2444 that you don't necessarily want to get hammered by compaction when you're first starting up (warming up caches). So I don't think check for compactions ever N seconds is a great policy. But, I'm not sure check every N seconds, starting M minutes after startup is great either because it's not something a user will just guess when he's wondering why aren't compactions happening yet? Any other ideas? Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Benjamin Coverston Fix For: 1.0.0 Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103950#comment-13103950 ] Brandon Williams commented on CASSANDRA-3181: - bq. repair should be on its own executor now, so it shouldn't block minors. Ok, then maybe I just hit one of the OOM bugs, and compaction had never fully completed. After restarting I never did any more writes, and we know compaction won't happen at startup. bq. If that's the case, then is there really much we want to do ? And even if we want, we should move that to 1.0.1. Just want to make sure this doesn't hide a real, unknown, problem. I think CASSANDRA-2444 was wrong. It should be an option, and one that is off by default. Starting a server with 1k sstables and having nothing happen is a bit of a shock, and having no way out of it besides hacks like forcing a flush or a major isn't great. Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Benjamin Coverston Fix For: 1.0.0 Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104059#comment-13104059 ] Benjamin Coverston commented on CASSANDRA-3181: --- Was a little concerned about removing the scheduled compaction from leveldb, but the mechanics are really no different from the tiered compaction in terms of it will stop when it's finished assuming that nothing goes wrong with the running compactions. To be (probably overly) pedantic another advantage this has is that you are essentially kicking off only a single compaction where when the server was brought down there were probably multiple compactions in flight. +1 Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Jonathan Ellis Labels: compaction Fix For: 1.0.0 Attachments: 3181.txt Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102751#comment-13102751 ] Brandon Williams commented on CASSANDRA-3181: - Note that this is with the default compaction strategy. Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Benjamin Coverston Fix For: 1.0.0 Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur
[ https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103334#comment-13103334 ] Benjamin Coverston commented on CASSANDRA-3181: --- Tested this a bit and I was able to reproduce the problem. Basically what I think happened was: # There were compactions in-flight and scheduled. # Something happened to halt the compactions (probably restarted the server) # https://issues.apache.org/jira/browse/CASSANDRA-2444 got in the way I propose one of the following: * add option in as originally proposed OR * put compactions on a timer and ask periodically `does any compaction need to be done`. Also stagger the start of the compactions by some variable amount (5-10 minutes) similar to what we did with Hinted Handoff in 0.7. Compaction fails to occur - Key: CASSANDRA-3181 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Benjamin Coverston Fix For: 1.0.0 Compaction just stops running at some point. To repro, insert like 20M rows with a 1G heap and you'll get around 1k sstables. Restarting doesn't help, you have to invoke a major to get anything to happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira