[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308939#comment-17308939 ] Jeff Jirsa commented on CASSANDRA-11684: Close as a dupe of 5051 In Slack I mentioned this morning: "I had implemented something like 5051 in an early version of the pluggable twcs jar before i realized how terrifying it is given cassandra's lack of actual cluster membership" The TLDR is I've learned a bit since 2015 and what I thought was a good idea then probably requires some technical support to do safely now. > Cleanup key ranges during compaction > > > Key: CASSANDRA-11684 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11684 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Stefan Podkowinski >Priority: Normal > > Currently cleanup is considered an optional, manual operation that users are > told to run to free disk space after a node was affected by topology changes. > However, unmanaged key ranges could also end up on a node through other ways, > e.g. manual added sstable files by an admin. > I'm also not sure unmanaged data is really that harmless and cleanup should > really be optional, if you don't need to reclaim the disk space. When it > comes to repairs, users are expected to purge a node after downtime in case > it was not fully covered by a repair within gc_grace afterwards, in order to > avoid re-introducing deleted data. But the same could happen with unmanaged > data, e.g. after topology changes activate unmanaged ranges again or after > restoring backups. > I'd therefor suggest to avoid rewriting key ranges no longer belonging to a > node and older than gc_grace during compactions. > Maybe we could also introduce another CLEANUP_COMPACTION operation to find > candidates based on SSTable.first/last in case we don't have pending regular > or tombstone compactions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308915#comment-17308915 ] Paulo Motta commented on CASSANDRA-11684: - bq. I think Jeff is speaking to the concept here Yeah I'm curious on why he claimed this is unsafe now since he supported it on CASSANDRA-5051. But it seems there could be some edge cases during network partitions. Any objection to closing this as dupe of CASSANDRA-5051? > Cleanup key ranges during compaction > > > Key: CASSANDRA-11684 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11684 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Stefan Podkowinski >Priority: Normal > > Currently cleanup is considered an optional, manual operation that users are > told to run to free disk space after a node was affected by topology changes. > However, unmanaged key ranges could also end up on a node through other ways, > e.g. manual added sstable files by an admin. > I'm also not sure unmanaged data is really that harmless and cleanup should > really be optional, if you don't need to reclaim the disk space. When it > comes to repairs, users are expected to purge a node after downtime in case > it was not fully covered by a repair within gc_grace afterwards, in order to > avoid re-introducing deleted data. But the same could happen with unmanaged > data, e.g. after topology changes activate unmanaged ranges again or after > restoring backups. > I'd therefor suggest to avoid rewriting key ranges no longer belonging to a > node and older than gc_grace during compactions. > Maybe we could also introduce another CLEANUP_COMPACTION operation to find > candidates based on SSTable.first/last in case we don't have pending regular > or tombstone compactions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308912#comment-17308912 ] Brandon Williams commented on CASSANDRA-11684: -- I think Jeff is speaking to the concept here, which seems like a duplicate of CASSANDRA-5051. > Cleanup key ranges during compaction > > > Key: CASSANDRA-11684 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11684 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Stefan Podkowinski >Priority: Normal > > Currently cleanup is considered an optional, manual operation that users are > told to run to free disk space after a node was affected by topology changes. > However, unmanaged key ranges could also end up on a node through other ways, > e.g. manual added sstable files by an admin. > I'm also not sure unmanaged data is really that harmless and cleanup should > really be optional, if you don't need to reclaim the disk space. When it > comes to repairs, users are expected to purge a node after downtime in case > it was not fully covered by a repair within gc_grace afterwards, in order to > avoid re-introducing deleted data. But the same could happen with unmanaged > data, e.g. after topology changes activate unmanaged ranges again or after > restoring backups. > I'd therefor suggest to avoid rewriting key ranges no longer belonging to a > node and older than gc_grace during compactions. > Maybe we could also introduce another CLEANUP_COMPACTION operation to find > candidates based on SSTable.first/last in case we don't have pending regular > or tombstone compactions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308865#comment-17308865 ] Paulo Motta commented on CASSANDRA-11684: - [~jjirsa] can you elaborate on why this is an issue? Unless I'm missing something this seems to be a duplicate of CASSANDRA-5051. > Cleanup key ranges during compaction > > > Key: CASSANDRA-11684 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11684 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Stefan Podkowinski >Priority: Normal > > Currently cleanup is considered an optional, manual operation that users are > told to run to free disk space after a node was affected by topology changes. > However, unmanaged key ranges could also end up on a node through other ways, > e.g. manual added sstable files by an admin. > I'm also not sure unmanaged data is really that harmless and cleanup should > really be optional, if you don't need to reclaim the disk space. When it > comes to repairs, users are expected to purge a node after downtime in case > it was not fully covered by a repair within gc_grace afterwards, in order to > avoid re-introducing deleted data. But the same could happen with unmanaged > data, e.g. after topology changes activate unmanaged ranges again or after > restoring backups. > I'd therefor suggest to avoid rewriting key ranges no longer belonging to a > node and older than gc_grace during compactions. > Maybe we could also introduce another CLEANUP_COMPACTION operation to find > candidates based on SSTable.first/last in case we don't have pending regular > or tombstone compactions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284966#comment-17284966 ] Jeff Jirsa commented on CASSANDRA-11684: Until strong cluster membership is a thing, this is probably more unsafe than people realize during range movements (nodetool move, expansions, shrinks, etc). > Cleanup key ranges during compaction > > > Key: CASSANDRA-11684 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11684 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Stefan Podkowinski >Priority: Normal > Labels: gsoc2021, mentor > > Currently cleanup is considered an optional, manual operation that users are > told to run to free disk space after a node was affected by topology changes. > However, unmanaged key ranges could also end up on a node through other ways, > e.g. manual added sstable files by an admin. > I'm also not sure unmanaged data is really that harmless and cleanup should > really be optional, if you don't need to reclaim the disk space. When it > comes to repairs, users are expected to purge a node after downtime in case > it was not fully covered by a repair within gc_grace afterwards, in order to > avoid re-introducing deleted data. But the same could happen with unmanaged > data, e.g. after topology changes activate unmanaged ranges again or after > restoring backups. > I'd therefor suggest to avoid rewriting key ranges no longer belonging to a > node and older than gc_grace during compactions. > Maybe we could also introduce another CLEANUP_COMPACTION operation to find > candidates based on SSTable.first/last in case we don't have pending regular > or tombstone compactions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org