[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-12 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065419#comment-18065419
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/12/26 3:50 PM:


There is 

{code}
protected static String buildSnapshotId(String keyspaceName, String 
tableName, UUID tableId, String tag)
{
return String.format("%s:%s:%s:%s", keyspaceName, tableName, tableId, 
tag);
}
{code}

So when we put there null, the string would be "ks:tb:null:tag" instead of 
"ks:tb:some-uuid:tag".

I am not sure we want to do something about that, I would rather had this 
"ks:tb:someuuid:tag" in case of having id and this "ks:tb:tag" in case we dont.

There is also

{code}
/**
 * Unique identifier of a snapshot. Used
 * only to deduplicate snapshots internally,
 * not exposed externally.
 * 
 * Format: "$ks:$table_name:$table_id:$tag"
 */
public String getId()
{
return buildSnapshotId(keyspaceName, tableName, tableId, tag);
}
{code}

which explicitly says this is only internally-facing so we should be fine to 
change it.

As 5.0 is more urgent, I would just did 5.0 now and left trunk for another 
ticket. 


was (Author: smiklosovic):
There is 

{code}
protected static String buildSnapshotId(String keyspaceName, String 
tableName, UUID tableId, String tag)
{
return String.format("%s:%s:%s:%s", keyspaceName, tableName, tableId, 
tag);
}
{code}

So when we put there null, the string would be "ks:tb:null:tag" instead of 
"ks:tb:some-uuid:tag".

I am not sure we want to do something about that, I would rather had this 
"ks:tb:someuuid:tag" in case of having id and this "ks:tb:tag" in case we dont.

There is also

{code}
/**
 * Unique identifier of a snapshot. Used
 * only to deduplicate snapshots internally,
 * not exposed externally.
 * 
 * Format: "$ks:$table_name:$table_id:$tag"
 */
public String getId()
{
return buildSnapshotId(keyspaceName, tableName, tableId, tag);
}
{code}

which explicitly says this is only internally-facing so we should be fine to 
change it.

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possib

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-12 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065419#comment-18065419
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/12/26 3:51 PM:


There is 

{code}
protected static String buildSnapshotId(String keyspaceName, String 
tableName, UUID tableId, String tag)
{
return String.format("%s:%s:%s:%s", keyspaceName, tableName, tableId, 
tag);
}
{code}

So when we put there null, the string would be "ks:tb:null:tag" instead of 
"ks:tb:some-uuid:tag".

I am not sure we want to do something about that, I would rather had this 
"ks:tb:someuuid:tag" in case of having id and this "ks:tb:tag" in case we dont.

There is also

{code}
/**
 * Unique identifier of a snapshot. Used
 * only to deduplicate snapshots internally,
 * not exposed externally.
 * 
 * Format: "$ks:$table_name:$table_id:$tag"
 */
public String getId()
{
return buildSnapshotId(keyspaceName, tableName, tableId, tag);
}
{code}

which explicitly says this is only internally-facing so we should be fine to 
change it.

As 5.0 is more urgent, I would just do 5.0 now and left trunk for another 
ticket. 


was (Author: smiklosovic):
There is 

{code}
protected static String buildSnapshotId(String keyspaceName, String 
tableName, UUID tableId, String tag)
{
return String.format("%s:%s:%s:%s", keyspaceName, tableName, tableId, 
tag);
}
{code}

So when we put there null, the string would be "ks:tb:null:tag" instead of 
"ks:tb:some-uuid:tag".

I am not sure we want to do something about that, I would rather had this 
"ks:tb:someuuid:tag" in case of having id and this "ks:tb:tag" in case we dont.

There is also

{code}
/**
 * Unique identifier of a snapshot. Used
 * only to deduplicate snapshots internally,
 * not exposed externally.
 * 
 * Format: "$ks:$table_name:$table_id:$tag"
 */
public String getId()
{
return buildSnapshotId(keyspaceName, tableName, tableId, tag);
}
{code}

which explicitly says this is only internally-facing so we should be fine to 
change it.

As 5.0 is more urgent, I would just did 5.0 now and left trunk for another 
ticket. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064249#comment-18064249
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/9/26 8:36 PM:
---

Heh, I was wrong, it _does_ work. I just forgot to update regexp. So listing 
and clearing of snapshots without table id, and these tables are dropped, does 
work. 


was (Author: smiklosovic):
Heh, I was wrong, it _does_ work. I just forgot to update regexp. So listing 
and clearing of snapshots without table id does work. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possibility which doesn't fix as many problems, is just to expose via 
> jmx/nodetool
> something to allow operators to bypass the snapshot loading mechanism and 
> directly clear the old pre-2.1 snapshots.
> A more involved and risky change would be to somehow think about how we 
> migrate all this existing data in different folder structures to new 
> consistent folder structure, but this seems quite involved and would likely 
> deserve it's own JIRA at least.
> I have a patch locally against trunk for the first approach, just storing the 
> tableId in the manifest.json, which does this and will run it against CI.
> I'll have a further think about if there are any other approaches, if anyone 
> has any ideas let me know.
> Another thing to consider is where we should apply this change.
> Probably at a minimum 5.0, since that's where one can no longer nodetool 
> clearsnapshot on certain tables and the effect is a bit worse there than in 
> 5.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064242#comment-18064242
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/9/26 8:12 PM:
---

They are manageable, in 2.0. Why they can not be listed and cleared in 2.0? 
Listing and clearing them in 5.0 and not doing it in 2.0 means that there is at 
least a theoretical chance that somebody is going to restore them after 5.0 
upgrade? 

Actually ... yeah. The fact you run on 2.0 does not mean your snapshots have 12 
years ... 


was (Author: smiklosovic):
They are manageable, in 2.0. Why they can not be listed and cleared in 2.0? 
Listing and clearing them in 5.0 and not doing it in 2.0 means that there is at 
least a theoretical chance that somebody is going to restore them after 5.0 
upgrade? 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possibility which doesn't fix as many problems, is just to expose via 
> jmx/nodetool
> something to allow operators to bypass the snapshot loading mechanism and 
> directly clear the old pre-2.1 snapshots.
> A more involved and risky change would be to somehow think about how we 
> migrate all this existing data in different folder structures to new 
> consistent folder structure, but this seems quite involved and would likely 
> deserve it's own JIRA at least.
> I have a patch locally against trunk for the first approach, just storing the 
> tableId in the manifest.json, which does this and will run it against CI.
> I'll have a further think about if there are any other approaches, if anyone 
> has any ideas let me know.
> Another thing to consider is where we should apply this change.
> Probably at a minimum 5.0, since that's where one can no longer nodetool 
> clearsnapshot on certain tables and the effect is a bit worse there than in 
> 5.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064234#comment-18064234
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/9/26 8:00 PM:
---

Sorry for asking but for what exact purpose do you have snapshots of dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind around the usecase. These snapshots have to have like 12 years, of 
_dropped tables_. Why dont you just back it up? I am also not sure that we 
should care so much about stuff in 2.0, these versions are officially 
discontinued and not supported anymore. 

??We could I suppose go with leveraging CFS for 5.0 if we have any concerns 
with changing on disk structures in an already released version. Though I guess 
in SnapshotManifest as you mentioned previously we have ignoreUnknown = true 
and I'm not sure I can find any concrete problems.??

I am trying to be as friction-less as possible. If we can go without we should. 
Why would we want to introduce it if we do not need it? That is my line of 
thinking here.

We can list snapshots of dropped tables from (1). I have not tested what 
happens when we try to load snapshot of dropped tables for which there is no 
table id, that is super corner-case. But in that case I do not think that the 
solution of adding table id into manifest would help anyway. A table is dropped 
so we do not have it via ColumnFamilyStore and directory does not contain it - 
so where would you want to actually get it from?

https://issues.apache.org/jira/browse/CASSANDRA-16843


was (Author: smiklosovic):
Sorry for asking but for what exact purpose do you have snapshots of dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the usecase. These snapshots have to have like 12 years, of _dropped 
tables_. Why dont you just back it up? I am also not sure that we should care 
so much about stuff in 2.0, these versions are officially discontinued and not 
supported anymore. 

??We could I suppose go with leveraging CFS for 5.0 if we have any concerns 
with changing on disk structures in an already released version. Though I guess 
in SnapshotManifest as you mentioned previously we have ignoreUnknown = true 
and I'm not sure I can find any concrete problems.??

I am trying to be as friction-less as possible. If we can go without we should. 
Why would we want to introduce it if we do not need it? That is my line of 
thinking here.

We can list snapshots of dropped tables from (1). I have not tested what 
happens when we try to load snapshot of dropped tables for which there is no 
table id, that is super corner-case. But in that case I do not think that the 
solution of adding table id into manifest would help anyway. A table is dropped 
so we do not have it via ColumnFamilyStore and directory does not contain it - 
so where would you want to actually get it from?

https://issues.apache.org/jira/browse/CASSANDRA-16843

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064234#comment-18064234
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/9/26 7:56 PM:
---

Sorry for asking but for what exact purpose do you have snapshots of dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the usecase. These snapshots have to have like 12 years, of _dropped 
tables_. Why dont you just back it up? I am also not sure that we should care 
so much about stuff in 2.0, these versions are officially discontinued and not 
supported anymore. 

??We could I suppose go with leveraging CFS for 5.0 if we have any concerns 
with changing on disk structures in an already released version. Though I guess 
in SnapshotManifest as you mentioned previously we have ignoreUnknown = true 
and I'm not sure I can find any concrete problems.??

I am trying to be as friction-less as possible. If we can go without we should. 
Why would we want to introduce it if we do not need it? That is my line of 
thinking here.

We can list snapshots of dropped tables from (1). I have not tested what 
happens when we try to load snapshot of dropped tables for which there is no 
table id, that is super corner-case. But in that case I do not think that the 
solution of adding table id into manifest would help anyway. A table is dropped 
so we do not have it via ColumnFamilyStore and directory does not contain it - 
so where would you want to actually get it from?

https://issues.apache.org/jira/browse/CASSANDRA-16843


was (Author: smiklosovic):
Sorry for asking but for what exact purpose do you have snapshots of dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the usecase. These snapshots have to have like 12 years, of _dropped 
tables_. Why dont you just back it up? I am also not sure that we should care 
so much about stuff in 2.0, these versions are officially discontinued and not 
supported anymore. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possibility which doesn't fix as many problems, is just to expose via 
> jmx/nodetool
> something to allow operators to bypa

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064234#comment-18064234
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/9/26 7:47 PM:
---

Sorry for asking but for what exact purpose do you have snapshots on dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the use case. These snapshots have to have like 12 years, of 
_dropped tables_. Why dont you just back it up? I am also not sure that we 
should care so much about stuff in 2.0, these versions are officially 
discontinued and not supported anymore. 


was (Author: smiklosovic):
Sorry for asking but from what exact purpose do you have snapshots on dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the use case. These snapshots have to have like 12 years, of 
_dropped tables_. Why dont you just back it up? I am also not sure that we 
should care so much about stuff in 2.0, these versions are officially 
discontinued and not supported anymore. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possibility which doesn't fix as many problems, is just to expose via 
> jmx/nodetool
> something to allow operators to bypass the snapshot loading mechanism and 
> directly clear the old pre-2.1 snapshots.
> A more involved and risky change would be to somehow think about how we 
> migrate all this existing data in different folder structures to new 
> consistent folder structure, but this seems quite involved and would likely 
> deserve it's own JIRA at least.
> I have a patch locally against trunk for the first approach, just storing the 
> tableId in the manifest.json, which does this and will run it against CI.
> I'll have a further think about if there are any other approaches, if anyone 
> has any ideas let me know.
> Another thing to consider is where we should apply this change.
> Probably at a minimum 5.0, since that's where one can no longer nodetool 
> clearsnapshot on certain tables and the effect is a bit worse there than in 
> 5.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064234#comment-18064234
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/9/26 7:47 PM:
---

Sorry for asking but for what exact purpose do you have snapshots of dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the use case. These snapshots have to have like 12 years, of 
_dropped tables_. Why dont you just back it up? I am also not sure that we 
should care so much about stuff in 2.0, these versions are officially 
discontinued and not supported anymore. 


was (Author: smiklosovic):
Sorry for asking but for what exact purpose do you have snapshots on dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the use case. These snapshots have to have like 12 years, of 
_dropped tables_. Why dont you just back it up? I am also not sure that we 
should care so much about stuff in 2.0, these versions are officially 
discontinued and not supported anymore. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possibility which doesn't fix as many problems, is just to expose via 
> jmx/nodetool
> something to allow operators to bypass the snapshot loading mechanism and 
> directly clear the old pre-2.1 snapshots.
> A more involved and risky change would be to somehow think about how we 
> migrate all this existing data in different folder structures to new 
> consistent folder structure, but this seems quite involved and would likely 
> deserve it's own JIRA at least.
> I have a patch locally against trunk for the first approach, just storing the 
> tableId in the manifest.json, which does this and will run it against CI.
> I'll have a further think about if there are any other approaches, if anyone 
> has any ideas let me know.
> Another thing to consider is where we should apply this change.
> Probably at a minimum 5.0, since that's where one can no longer nodetool 
> clearsnapshot on certain tables and the effect is a bit worse there than in 
> 5.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064234#comment-18064234
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/9/26 7:47 PM:
---

Sorry for asking but for what exact purpose do you have snapshots of dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the usecase. These snapshots have to have like 12 years, of _dropped 
tables_. Why dont you just back it up? I am also not sure that we should care 
so much about stuff in 2.0, these versions are officially discontinued and not 
supported anymore. 


was (Author: smiklosovic):
Sorry for asking but for what exact purpose do you have snapshots of dropped 
tables from times of 2.0 and you want to upgrade to 5.0? I just can not wrap my 
mind about the use case. These snapshots have to have like 12 years, of 
_dropped tables_. Why dont you just back it up? I am also not sure that we 
should care so much about stuff in 2.0, these versions are officially 
discontinued and not supported anymore. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possibility which doesn't fix as many problems, is just to expose via 
> jmx/nodetool
> something to allow operators to bypass the snapshot loading mechanism and 
> directly clear the old pre-2.1 snapshots.
> A more involved and risky change would be to somehow think about how we 
> migrate all this existing data in different folder structures to new 
> consistent folder structure, but this seems quite involved and would likely 
> deserve it's own JIRA at least.
> I have a patch locally against trunk for the first approach, just storing the 
> tableId in the manifest.json, which does this and will run it against CI.
> I'll have a further think about if there are any other approaches, if anyone 
> has any ideas let me know.
> Another thing to consider is where we should apply this change.
> Probably at a minimum 5.0, since that's where one can no longer nodetool 
> clearsnapshot on certain tables and the effect is a bit worse there than in 
> 5.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063606#comment-18063606
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 6:05 PM:
---

In SnapshotLoader, you are doing

{{UUID tableId = snapshotDirMatcher.group("tableId") != null ? 
parseUUID(snapshotDirMatcher.group("tableId")) : null;}}

Can not you rework it to this?
{code:java}
final AtomicReference tableId = new AtomicReference<>();
if (snapshotDirMatcher.group("tableId") == null)
{
ColumnFamilyStore cfs = ColumnFamilyStore.getIfExists(keyspaceName, 
tableName);
if (cfs != null)
{
tableId.set(cfs.metadata.id.asUUID());
}
}
else
{
tableId.set(parseUUID(snapshotDirMatcher.group("tableId")));
} {code}
Then we do not need to change anything in SnapshotManifest / TableSnapshot.


was (Author: smiklosovic):
In SnapshotLoader, you are doing

{{UUID tableId = snapshotDirMatcher.group("tableId") != null ? 
parseUUID(snapshotDirMatcher.group("tableId")) : null;}}

Can not you rework it to this?

{code}
UUID tableId = null;
if (snapshotDirMatcher.group("tableId") == null)
{
ColumnFamilyStore cfs = 
ColumnFamilyStore.getIfExists(keyspaceName, tableName);
if (cfs != null)
{
tableId = cfs.metadata.id.asUUID();
}
}
else
{
tableId = parseUUID(snapshotDirMatcher.group("tableId"));
}
{code}

Then we do not need to change anything in SnapshotManifest / TableSnapshot.

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possibility which doesn't fix as many problems, is just to expose via 
> jmx/nodetool
> something to allow operators to bypass the snapshot loading mechanism and 
> directly clear the old pre-2.1 snapshots.
> A more involved and risky change would be to somehow think about how we 
> migrate all this existing data in different folder structures to new 
> consistent folder structure, but this seems quite involved and would likely 
> deserve it's own JIRA at least.
> I have a patch locally against trunk for the first approach, just storing the 
>

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 6:00 PM:
---

1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. This would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically. Another technique would be:

1) create snapshot directory with id
2) for each file in old dir
2a) if old file is not a hardlink, _move_ it to new dir
2b) if old file is a hardlink, create hardlink in a new dir and remove the old 
hard link
3) remove old dir

This will move it in a manner which does not occupy any additional disk space 
as you are either moving or just hard-linking.-


If it breaks in the middle of doing this, on another try you just need to start 
over. It is invalid to have "snapshot-name-12345" (number being table id) as 
well as "snapshot-name" directory _simultaneously_ on a disk. You will always 
know that you need to "finish this". 

EDIT: All of that above for point 4) is wrong, snapshots do not have ids in 
them. Tables have. 

Then SnapshotManager loading logic does not need to change and all snapshots 
will contain table id in their directory paths, as it should be.

If we do not rename the directories, then if table-id-less snapshots are 
accepted / recognized by newer nodes and you start to take snapshots on them 
after an ugrade, that would mean that you would end up with two directory 
structures at disk? There would be a mix of snapshot dirs _without_ table id as 
well as dirs _with_ table id? That is quite unfortunate because while it might 
work from the perspective of a node itself, this might disrupt various backup 
tooling which works in an offline manner, just by scanning the dirs, outside of 
Cassandra's process, and the parsing logic of such tooling would need to be 
accommodated as well etc ... No fun.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

-4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. This

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 6:01 PM:
---

1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. This would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically. Another technique would be:

1) create snapshot directory with id
2) for each file in old dir
2a) if old file is not a hardlink, _move_ it to new dir
2b) if old file is a hardlink, create hardlink in a new dir and remove the old 
hard link
3) remove old dir

This will move it in a manner which does not occupy any additional disk space 
as you are either moving or just hard-linking.-

If it breaks in the middle of doing this, on another try you just need to start 
over. It is invalid to have "snapshot-name-12345" (number being table id) as 
well as "snapshot-name" directory _simultaneously_ on a disk. You will always 
know that you need to "finish this". 

*EDIT: All of that above for point 4) is wrong, snapshots do not have ids in 
them. Tables have. *

Then SnapshotManager loading logic does not need to change and all snapshots 
will contain table id in their directory paths, as it should be.

If we do not rename the directories, then if table-id-less snapshots are 
accepted / recognized by newer nodes and you start to take snapshots on them 
after an ugrade, that would mean that you would end up with two directory 
structures at disk? There would be a mix of snapshot dirs _without_ table id as 
well as dirs _with_ table id? That is quite unfortunate because while it might 
work from the perspective of a node itself, this might disrupt various backup 
tooling which works in an offline manner, just by scanning the dirs, outside of 
Cassandra's process, and the parsing logic of such tooling would need to be 
accommodated as well etc ... No fun.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. This

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 6:00 PM:
---

1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

-4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. This would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically. Another technique would be:

1) create snapshot directory with id
2) for each file in old dir
2a) if old file is not a hardlink, _move_ it to new dir
2b) if old file is a hardlink, create hardlink in a new dir and remove the old 
hard link
3) remove old dir

This will move it in a manner which does not occupy any additional disk space 
as you are either moving or just hard-linking.-

All of that above is wrong, snapshots do not have ids in them. Tables have. 

If it breaks in the middle of doing this, on another try you just need to start 
over. It is invalid to have "snapshot-name-12345" (number being table id) as 
well as "snapshot-name" directory _simultaneously_ on a disk. You will always 
know that you need to "finish this". 

Then SnapshotManager loading logic does not need to change and all snapshots 
will contain table id in their directory paths, as it should be.

If we do not rename the directories, then if table-id-less snapshots are 
accepted / recognized by newer nodes and you start to take snapshots on them 
after an ugrade, that would mean that you would end up with two directory 
structures at disk? There would be a mix of snapshot dirs _without_ table id as 
well as dirs _with_ table id? That is quite unfortunate because while it might 
work from the perspective of a node itself, this might disrupt various backup 
tooling which works in an offline manner, just by scanning the dirs, outside of 
Cassandra's process, and the parsing logic of such tooling would need to be 
accommodated as well etc ... No fun.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. This would also mean 
tha

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 9:13 AM:
---

1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. This would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically. Another technique would be:

1) create snapshot directory with id
2) for each file in old dir
2a) if old file is not a hardlink, _move_ it to new dir
2b) if old file is a hardlink, create hardlink in a new dir and remove the old 
hard link
3) remove old dir

This will move it in a manner which does not occupy any additional disk space 
as you are either moving or just hard-linking.

If it breaks in the middle of doing this, on another try you just need to start 
over. It is invalid to have "snapshot-name-12345" (number being table id) as 
well as "snapshot-name" directory _simultaneously_ on a disk. You will always 
know that you need to "finish this". 

Then SnapshotManager loading logic does not need to change and all snapshots 
will contain table id in their directory paths, as it should be.

If we do not rename the directories, then if table-id-less snapshots are 
accepted / recognized by newer nodes and you start to take snapshots on them 
after an ugrade, that would mean that you would end up with two directory 
structures at disk? There would be a mix of snapshot dirs _without_ table id as 
well as dirs _with_ table id? That is quite unfortunate because while it might 
work from the perspective of a node itself, this might disrupt various backup 
tooling which works in an offline manner, just by scanning the dirs, outside of 
Cassandra's process, and the parsing logic of such tooling would need to be 
accommodated as well etc ... No fun.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
hap

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 9:05 AM:
---

1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

If we do not rename the directories, then if table-id-less snapshots are 
accepted / recognized by newer nodes and you start to take snapshots on them 
after an ugrade, that would mean that you would end up with two directory 
structures at disk? There would be a mix of snapshot dirs _without_ table id as 
well as dirs _with_ table id? That is quite unfortunate because while it might 
work from the perspective of a node itself, this might disrupt various backup 
tooling which works in an offline manner, just by scanning the dirs, outside of 
Cassandra's process, and the parsing logic of such tooling would need to be 
accommodated as well etc ... No fun.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

If we do not renamed the directories, then if table-id-less snapshots are 
accepted / recognized by newer nodes and you start to take snapshots on them 
after an ugrade, that would mean that you would end up with two directory 
structures at disk? There would be a mix of snapshot dirs _without_ table id as 
well as dirs _with_ table id? That is quite unfortunate because while it might 
work from the perspective of a n

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 9:01 AM:
---

1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

If we do not renamed the directories, then if table-id-less snapshots are 
accepted / recognized by newer nodes and you start to take snapshots on them 
after an ugrade, that would mean that you would end up with two directory 
structures at disk? There would be a mix of snapshot dirs _without_ table id as 
well as dirs _with_ table id? That is quite unfortunate because while it might 
work from the perspective of a node itself, this might disrupt various backup 
tooling which works in an offline manner, just by scanning the dirs, outside of 
Cassandra's process, and the parsing logic of such tooling would need to be 
accommodated as well etc ... No fun.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 

> Snapshots from tables wi

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 8:58 AM:
---

1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 

2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.

3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that.

4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 
2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.
3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that
4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior 

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 8:56 AM:
---

1) I do not think that doing this for anything but trunk is a good idea (or at 
least should be approached with caution). 
2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.
3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that
4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea
2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.
3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that
4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 8:54 AM:
---

1) I do not think that doing this for anything but trunk is a good idea
2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.
3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that
4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids (this would also mean 
that we would need to re-hard-link the snapshots files, I guess, not sure what 
happens when we rename a directory with hardlinks in it, if it is 
re-hard-linked automatically). Then SnapshotManager loading logic does not need 
to change and all snapshots will contain table id in their directory paths, as 
it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea
2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.
3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that
4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. Then SnapshotManager 
loading logic does not need to change and all snapshots will contain table id 
in their directory paths, as it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is tha

[jira] [Comment Edited] (CASSANDRA-21173) Snapshots from tables without table-id embedded in their folder name are not loaded by SnapshotLoader

2026-03-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063478#comment-18063478
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21173 at 3/6/26 8:51 AM:
---

1) I do not think that doing this for anything but trunk is a good idea
2) The main reason you are doing this is that you have 2.x node with snapshots 
without table id in a directory name and you are upgrading. That is the only 
reason you are trying to introduce table id into the manifest, apart from that, 
that "code path" is not normally exercised. In the context of newer branches, 
snapshots without table id in a directory name are invalid. I am not sure we 
should accommodate the code just for this corner case which comes from times of 
2.x.
3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that
4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. Then SnapshotManager 
loading logic does not need to change and all snapshots will contain table id 
in their directory paths, as it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 


was (Author: smiklosovic):
1) I do not think that doing this for anything but trunk is a good idea
2) The main reason you are doing this is that you have 2a .x node with 
snapshots without table id in a directory name and you are upgrading. That is 
the only reason you are trying to introduce table id into the manifest, apart 
from that, that "code path" is not normally exercised. In the context of newer 
branches, snapshots without table id in a directory name are invalid. I am not 
sure we should accommodate the code just for this corner case which comes from 
times of 2.x.
3) My concerns are that this might bring some compatibility issues when you 
introduce table id into manifest like that. However, thinking about it more, we 
have "ignore unknown properties" in the DTO when we deserialize JSON with that 
field into that object, so it _should_ work, but it would be nice to have some 
test around that
4) My preferred way of dealing with this is that when a node starts, it will 
scan the snapshots, they are loaded right now by SnapshotManager. What we could 
do is to scan it even before that, identify snapshots which do not have ID in 
them, and rename these directories to contain table ids. Then SnapshotManager 
loading logic does not need to change and all snapshots will contain table id 
in their directory paths, as it should be.

We might indeed entertain the idea of introducing table id into a manifest, or 
modify the content of manifest to contain way more information (hashes of 
files, their sizes ...), and yes, table id as well, but the enrichment of 
manifest.json like that needs a completely separate discussion which should 
happen independently from your necessity to load older snapshots into a newer 
node. 

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -
>
> Key: CASSANDRA-21173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
> Project: Apache Cassandra
>  Issue Type: Bug
>  Components: Local/Snapshots, Local/Startup and Shutdown
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Normal
> Fix For: 5.0.x, 5.1, 6.x
>
> Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from