[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction

2021-03-25 Thread Jeff Jirsa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308939#comment-17308939
 ] 

Jeff Jirsa commented on CASSANDRA-11684:


Close as a dupe of 5051

In Slack I mentioned this morning:

"I had implemented something like 5051 in an early version of the pluggable 
twcs jar before i realized how terrifying it is given cassandra's lack of 
actual cluster membership"

The TLDR is I've learned a bit since 2015 and what I thought was a good idea 
then probably requires some technical support to do safely now.


> Cleanup key ranges during compaction
> 
>
> Key: CASSANDRA-11684
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Stefan Podkowinski
>Priority: Normal
>
> Currently cleanup is considered an optional, manual operation that users are 
> told to run to free disk space after a node was affected by topology changes. 
> However, unmanaged key ranges could also end up on a node through other ways, 
> e.g. manual added sstable files by an admin. 
> I'm also not sure unmanaged data is really that harmless and cleanup should 
> really be optional, if you don't need to reclaim the disk space. When it 
> comes to repairs, users are expected to purge a node after downtime in case 
> it was not fully covered by a repair within gc_grace afterwards, in order to 
> avoid re-introducing deleted data. But the same could happen with unmanaged 
> data, e.g. after topology changes activate unmanaged ranges again or after 
> restoring backups.
> I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
> node and older than gc_grace during compactions. 
> Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
> candidates based on SSTable.first/last in case we don't have pending regular 
> or tombstone compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction

2021-03-25 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308915#comment-17308915
 ] 

Paulo Motta commented on CASSANDRA-11684:
-

bq. I think Jeff is speaking to the concept here

Yeah I'm curious on why he claimed this is unsafe now since he supported it on 
CASSANDRA-5051. But it seems there could be some edge cases during network 
partitions.

Any objection to closing this as dupe of CASSANDRA-5051?

> Cleanup key ranges during compaction
> 
>
> Key: CASSANDRA-11684
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Stefan Podkowinski
>Priority: Normal
>
> Currently cleanup is considered an optional, manual operation that users are 
> told to run to free disk space after a node was affected by topology changes. 
> However, unmanaged key ranges could also end up on a node through other ways, 
> e.g. manual added sstable files by an admin. 
> I'm also not sure unmanaged data is really that harmless and cleanup should 
> really be optional, if you don't need to reclaim the disk space. When it 
> comes to repairs, users are expected to purge a node after downtime in case 
> it was not fully covered by a repair within gc_grace afterwards, in order to 
> avoid re-introducing deleted data. But the same could happen with unmanaged 
> data, e.g. after topology changes activate unmanaged ranges again or after 
> restoring backups.
> I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
> node and older than gc_grace during compactions. 
> Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
> candidates based on SSTable.first/last in case we don't have pending regular 
> or tombstone compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction

2021-03-25 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308912#comment-17308912
 ] 

Brandon Williams commented on CASSANDRA-11684:
--

I think Jeff is speaking to the concept here, which seems like a duplicate of 
CASSANDRA-5051.

> Cleanup key ranges during compaction
> 
>
> Key: CASSANDRA-11684
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Stefan Podkowinski
>Priority: Normal
>
> Currently cleanup is considered an optional, manual operation that users are 
> told to run to free disk space after a node was affected by topology changes. 
> However, unmanaged key ranges could also end up on a node through other ways, 
> e.g. manual added sstable files by an admin. 
> I'm also not sure unmanaged data is really that harmless and cleanup should 
> really be optional, if you don't need to reclaim the disk space. When it 
> comes to repairs, users are expected to purge a node after downtime in case 
> it was not fully covered by a repair within gc_grace afterwards, in order to 
> avoid re-introducing deleted data. But the same could happen with unmanaged 
> data, e.g. after topology changes activate unmanaged ranges again or after 
> restoring backups.
> I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
> node and older than gc_grace during compactions. 
> Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
> candidates based on SSTable.first/last in case we don't have pending regular 
> or tombstone compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction

2021-03-25 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308865#comment-17308865
 ] 

Paulo Motta commented on CASSANDRA-11684:
-

[~jjirsa] can you elaborate on why this is an issue? Unless I'm missing 
something this seems to be a duplicate of CASSANDRA-5051.

> Cleanup key ranges during compaction
> 
>
> Key: CASSANDRA-11684
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Stefan Podkowinski
>Priority: Normal
>
> Currently cleanup is considered an optional, manual operation that users are 
> told to run to free disk space after a node was affected by topology changes. 
> However, unmanaged key ranges could also end up on a node through other ways, 
> e.g. manual added sstable files by an admin. 
> I'm also not sure unmanaged data is really that harmless and cleanup should 
> really be optional, if you don't need to reclaim the disk space. When it 
> comes to repairs, users are expected to purge a node after downtime in case 
> it was not fully covered by a repair within gc_grace afterwards, in order to 
> avoid re-introducing deleted data. But the same could happen with unmanaged 
> data, e.g. after topology changes activate unmanaged ranges again or after 
> restoring backups.
> I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
> node and older than gc_grace during compactions. 
> Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
> candidates based on SSTable.first/last in case we don't have pending regular 
> or tombstone compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11684) Cleanup key ranges during compaction

2021-02-15 Thread Jeff Jirsa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284966#comment-17284966
 ] 

Jeff Jirsa commented on CASSANDRA-11684:


Until strong cluster membership is a thing, this is probably more unsafe than 
people realize during range movements (nodetool move, expansions, shrinks, etc).


> Cleanup key ranges during compaction
> 
>
> Key: CASSANDRA-11684
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Stefan Podkowinski
>Priority: Normal
>  Labels: gsoc2021, mentor
>
> Currently cleanup is considered an optional, manual operation that users are 
> told to run to free disk space after a node was affected by topology changes. 
> However, unmanaged key ranges could also end up on a node through other ways, 
> e.g. manual added sstable files by an admin. 
> I'm also not sure unmanaged data is really that harmless and cleanup should 
> really be optional, if you don't need to reclaim the disk space. When it 
> comes to repairs, users are expected to purge a node after downtime in case 
> it was not fully covered by a repair within gc_grace afterwards, in order to 
> avoid re-introducing deleted data. But the same could happen with unmanaged 
> data, e.g. after topology changes activate unmanaged ranges again or after 
> restoring backups.
> I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
> node and older than gc_grace during compactions. 
> Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
> candidates based on SSTable.first/last in case we don't have pending regular 
> or tombstone compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org