[jira] [Commented] (CASSANDRA-1983) Make sstable filenames contain a UUID instead of increasing integer
[ https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903877#comment-13903877 ] Daniel Shelepov commented on CASSANDRA-1983: Notes so far: - sstable filenames are controlled by the io/sstable/Descriptor class, which encapsulates a few parameters including generation -- the increasing integer in question. - dropping generation in favor of a uuid seems questionable, given that generation is used by a wide variety of clients in the codebase. So the most likely approach is uuid + generation side by side. - using the host id as the uuid is easy conceptually, but will violate layering, because code in io will start to depend on db and/or service. Plus there is potential bootstrapping problem where system sstables need to be initialized early on during boot, and it's not clear whether the unique host id is available early enough to feed into system sstable descriptors. - random uuids are also tricky, because sstable names will no longer be discoverable without directory lookups. Some code (particularly in unit tests) leans on the ability to synthesize sstable names without touching the filesystem. It's possible to persist these uuids in one of the system tables, but it will have to be a local table, and, regardless, changing system schema can make this a breaking change. I haven't yet found a cost-effective fix that would involve actually modifying the existing naming scheme. The latest idea I have is to create a directory that will hold symlinks to real sstables (symlinks are available in Java 7). Symlink names will contain the UUIDs. The only extra piece of code would be creating and tearing down symlinks when real sstables are created and deleted. End users could then access sstables through this symlink directory whenever doing related maintenance. The last piece would be making sure that appropriate clients, such as the compactor, can consume sstables with and without UUIDs. I'll work on this some more tomorrow, but it'll probably spill until next week (or later). Comments welcome. Make sstable filenames contain a UUID instead of increasing integer --- Key: CASSANDRA-1983 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983 Project: Cassandra Issue Type: Improvement Reporter: David King Priority: Minor sstable filenames look like CFName-1569-Index.db, containing an integer for uniqueness. This makes it possible (however unlikely) that the integer could overflow, which could be a problem. It also makes it difficult to collapse multiple nodes into a single one with rsync. I do this occasionally for testing: I'll copy our 20 node cluster into only 3 nodes by copying all of the data files and running cleanup; at present this requires a manual step of uniqifying the overlapping sstable names. If instead of an incrementing integer, it would be handy if these contained a UUID or somesuch that guarantees uniqueness across the cluster. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-1983) Make sstable filenames contain a UUID instead of increasing integer
[ https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904441#comment-13904441 ] Daniel Shelepov commented on CASSANDRA-1983: OK, but distinct symlinks are much easier to make unique, because they won't affect all that code that expects to find sstables under well-known names (regular names still being used in regular sstable storage). The fact that they're symlinks allows decoupling the problem from internal naming requirements. Make sstable filenames contain a UUID instead of increasing integer --- Key: CASSANDRA-1983 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983 Project: Cassandra Issue Type: Improvement Reporter: David King Priority: Minor sstable filenames look like CFName-1569-Index.db, containing an integer for uniqueness. This makes it possible (however unlikely) that the integer could overflow, which could be a problem. It also makes it difficult to collapse multiple nodes into a single one with rsync. I do this occasionally for testing: I'll copy our 20 node cluster into only 3 nodes by copying all of the data files and running cleanup; at present this requires a manual step of uniqifying the overlapping sstable names. If instead of an incrementing integer, it would be handy if these contained a UUID or somesuch that guarantees uniqueness across the cluster. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-1983) Make sstable filenames contain a UUID instead of increasing integer
[ https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900892#comment-13900892 ] Jonathan Ellis commented on CASSANDRA-1983: --- Yes. Make sstable filenames contain a UUID instead of increasing integer --- Key: CASSANDRA-1983 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983 Project: Cassandra Issue Type: Improvement Reporter: David King Priority: Minor sstable filenames look like CFName-1569-Index.db, containing an integer for uniqueness. This makes it possible (however unlikely) that the integer could overflow, which could be a problem. It also makes it difficult to collapse multiple nodes into a single one with rsync. I do this occasionally for testing: I'll copy our 20 node cluster into only 3 nodes by copying all of the data files and running cleanup; at present this requires a manual step of uniqifying the overlapping sstable names. If instead of an incrementing integer, it would be handy if these contained a UUID or somesuch that guarantees uniqueness across the cluster. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-1983) Make sstable filenames contain a UUID instead of increasing integer
[ https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900054#comment-13900054 ] Daniel Shelepov commented on CASSANDRA-1983: Is this still needed? Naming in 2.0+ is still incremental as far as I can tell. I'd like to work on this fix while I'm learning the codebase. Make sstable filenames contain a UUID instead of increasing integer --- Key: CASSANDRA-1983 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983 Project: Cassandra Issue Type: Improvement Reporter: David King Priority: Minor sstable filenames look like CFName-1569-Index.db, containing an integer for uniqueness. This makes it possible (however unlikely) that the integer could overflow, which could be a problem. It also makes it difficult to collapse multiple nodes into a single one with rsync. I do this occasionally for testing: I'll copy our 20 node cluster into only 3 nodes by copying all of the data files and running cleanup; at present this requires a manual step of uniqifying the overlapping sstable names. If instead of an incrementing integer, it would be handy if these contained a UUID or somesuch that guarantees uniqueness across the cluster. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-1983) Make sstable filenames contain a UUID instead of increasing integer
[ https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262809#comment-13262809 ] Robert Coli commented on CASSANDRA-1983: +1 here, global uniqueness for sstable names would make many copy-the-sstables style maint operations easier, as you wouldn't have to manually resolve the namespace conflict. just now I saw someone in #cassandra who was setting up a cluster with a copy of data get confused by non-unique filenames being overwritten on his new cluster. the only downside seems to be longer sstable file names. Make sstable filenames contain a UUID instead of increasing integer --- Key: CASSANDRA-1983 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983 Project: Cassandra Issue Type: Improvement Reporter: David King Priority: Minor sstable filenames look like CFName-1569-Index.db, containing an integer for uniqueness. This makes it possible (however unlikely) that the integer could overflow, which could be a problem. It also makes it difficult to collapse multiple nodes into a single one with rsync. I do this occasionally for testing: I'll copy our 20 node cluster into only 3 nodes by copying all of the data files and running cleanup; at present this requires a manual step of uniqifying the overlapping sstable names. If instead of an incrementing integer, it would be handy if these contained a UUID or somesuch that guarantees uniqueness across the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-1983) Make sstable filenames contain a UUID instead of increasing integer
[ https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981507#action_12981507 ] Ryan King commented on CASSANDRA-1983: -- Alternatively, since we'll need a host-uuid mapping for counters we can put that uuid in the filename along with a serial integer (make it a long and we should be ok, right?) Make sstable filenames contain a UUID instead of increasing integer --- Key: CASSANDRA-1983 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.0 Reporter: David King Priority: Minor sstable filenames look like CFName-1569-Index.db, containing an integer for uniqueness. This makes it possible (however unlikely) that the integer could overflow, which could be a problem. It also makes it difficult to collapse multiple nodes into a single one with rsync. I do this occasionally for testing: I'll copy our 20 node cluster into only 3 nodes by copying all of the data files and running cleanup; at present this requires a manual step of uniqifying the overlapping sstable names. If instead of an incrementing integer, it would be handy if these contained a UUID or somesuch that guarantees uniqueness across the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1983) Make sstable filenames contain a UUID instead of increasing integer
[ https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981509#action_12981509 ] David King commented on CASSANDRA-1983: --- As long as the host is still willing to read filenames without its own uuid, sure Make sstable filenames contain a UUID instead of increasing integer --- Key: CASSANDRA-1983 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.0 Reporter: David King Priority: Minor sstable filenames look like CFName-1569-Index.db, containing an integer for uniqueness. This makes it possible (however unlikely) that the integer could overflow, which could be a problem. It also makes it difficult to collapse multiple nodes into a single one with rsync. I do this occasionally for testing: I'll copy our 20 node cluster into only 3 nodes by copying all of the data files and running cleanup; at present this requires a manual step of uniqifying the overlapping sstable names. If instead of an incrementing integer, it would be handy if these contained a UUID or somesuch that guarantees uniqueness across the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.