[jira] [Commented] (CASSANDRA-8498) Replaying commit log records that are older than gc_grace is dangerous
[ https://issues.apache.org/jira/browse/CASSANDRA-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304265#comment-15304265 ] Joshua McKenzie commented on CASSANDRA-8498: bq. We advise operators not to do this (in order to avoid zombie data), but there are cases where it makes sense If the assumed majority case is "don't do this unless you really know what you're doing", I think we should reflect that in our operations. I'm +1 on failing to start w/out a specific flag that you know what you're doing, and having a relatively verbose log message on failed startup w/out that flag. > Replaying commit log records that are older than gc_grace is dangerous > -- > > Key: CASSANDRA-8498 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8498 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict > > If we replay commit log records that are older than gc_grace we could > introduce data corruption to the cluster. We should either (1) fail and > suggest a repair, or (2) log an exception. I prefer (1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8498) Replaying commit log records that are older than gc_grace is dangerous
[ https://issues.apache.org/jira/browse/CASSANDRA-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302972#comment-15302972 ] Tyler Hobbs commented on CASSANDRA-8498: bq. I'd prefer to log an error and skip the records involved but otherwise start up normally. This would cause problems for tables with low (or 0) {{gc_grace_seconds}}, which is common for tables where everything is TTL'ed. It seems like this problem is equivalent to "should we allow a node to start that's been down for more than {{gc_grace_seconds}}". We advise operators not to do this (in order to avoid zombie data), but there are cases where it makes sense, like the TTL case above, or if deletes are never performed on a cluster. I'm sure there are operators out there who are not aware of these guidelines, so it might make sense to make them more explicit by requiring a {{-D}} flag to start when commit log segments are older than gc_grace. If the flag is not used, we fail to start and print a message about the guidelines and mention the flag. > Replaying commit log records that are older than gc_grace is dangerous > -- > > Key: CASSANDRA-8498 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8498 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict > > If we replay commit log records that are older than gc_grace we could > introduce data corruption to the cluster. We should either (1) fail and > suggest a repair, or (2) log an exception. I prefer (1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8498) Replaying commit log records that are older than gc_grace is dangerous
[ https://issues.apache.org/jira/browse/CASSANDRA-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250111#comment-14250111 ] Jonathan Ellis commented on CASSANDRA-8498: --- What does failing buy us, other than making it really obvious that there's a problem? I'd prefer to log an error and skip the records involved but otherwise start up normally. Replaying commit log records that are older than gc_grace is dangerous -- Key: CASSANDRA-8498 URL: https://issues.apache.org/jira/browse/CASSANDRA-8498 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict If we replay commit log records that are older than gc_grace we could introduce data corruption to the cluster. We should either (1) fail and suggest a repair, or (2) log an exception. I prefer (1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8498) Replaying commit log records that are older than gc_grace is dangerous
[ https://issues.apache.org/jira/browse/CASSANDRA-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250121#comment-14250121 ] Benedict commented on CASSANDRA-8498: - In this situation a repair is a _really_ good idea, is all. Perhaps we should only fail if there hasn't been a sufficiently new repair to have likely fixed any issue? Of course, I know I have the strongest penchant for C* autodeath, but mostly because we know that users do not read their log files as diligently as they should, and we get blamed if there are data corruption or loss problems. Replaying commit log records that are older than gc_grace is dangerous -- Key: CASSANDRA-8498 URL: https://issues.apache.org/jira/browse/CASSANDRA-8498 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict If we replay commit log records that are older than gc_grace we could introduce data corruption to the cluster. We should either (1) fail and suggest a repair, or (2) log an exception. I prefer (1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)