[GitHub] [cassandra] smiklosovic commented on a diff in pull request #2117: Add a virtual table to list snapshots for CASSANDRA-18102

via GitHub Thu, 02 Mar 2023 05:34:18 -0800


smiklosovic commented on code in PR #2117:
URL: https://github.com/apache/cassandra/pull/2117#discussion_r1123076202



##########
src/java/org/apache/cassandra/db/virtual/SnapshotsTable.java:
##########
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.virtual;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import javax.management.openmbean.TabularData;
+import javax.management.openmbean.TabularDataSupport;
+import com.google.common.collect.ImmutableMap;
+
+import org.apache.cassandra.db.SnapshotDetailsTabularData;
+import org.apache.cassandra.db.marshal.BooleanType;
+import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.dht.LocalPartitioner;
+import org.apache.cassandra.schema.TableMetadata;
+import org.apache.cassandra.service.StorageService;
+
+public class SnapshotsTable extends AbstractVirtualTable
+{
+    private static final String SNAPSHOT_NAME = "snapshot_name";
+    private static final String KEYSPACE_NAME = "keyspace_name";
+    private static final String COLUMNFAMILY_NAME = "columnfamily_name";
+    private static final String TRUE_SIZE = "true_size";
+    private static final String SIZE_ON_DISK = "size_on_disk";
+    private static final String CREATE_TIME = "created_at";
+    private static final String EXPIRATION_TIME = "expires_at";
+    private static final String EPHEMERAL = "ephemeral";
+    
+    SnapshotsTable(String keyspace)
+    {
+        super(TableMetadata.builder(keyspace, "snapshots")
+                           .comment("tables snapshots")
+                           .kind(TableMetadata.Kind.VIRTUAL)
+                           .partitioner(new 
LocalPartitioner(UTF8Type.instance))
+                           .addPartitionKeyColumn(SNAPSHOT_NAME, 
UTF8Type.instance)

Review Comment:
   @pauloricardomg 
   
   Regardless of whether we can query without allow filtering after 18238 I 
think it is still a good practice to act as we would normally do when modeling 
the schema.
   
   As explained above and what Maxwell just explained, I think keyspace > table 
> snapshot id.
   
   We can have same snapshot names after all, no? So if keyspace and table are 
clustering columns, with two snapshots of the same name, we would have this 
partition:
   
       snapshotName1 | keyspace1 | table1
       snapshotName1 | keyspace2 | table2
   
   primary key would be `(snapshotName, (keyspaceName, tableName))`
   
   Is not this counter-intuitive to have a partition which is logically 
coupling different keyspaces and names under the same snapshot name? That does 
not make sense to me. 
   
   However, on the other hand, it is possible to do this:
   
       ./bin/nodetool snapshot --kt-list ks.tb,system.local -t mysnapshot
   
   So `listsnapshots` will do this:
   
       mysnapshot  system  local  1.16 KiB  21.47 KiB    
2023-03-02T13:19:42.140Z
       mysnapshot  ks      tb     1.02 KiB   6.08 KiB    
2023-03-02T13:19:13.757Z
   
   But then I can do this as well, again:
   
       ./bin/nodetool snapshot --kt-list ks.tb2 -t mysnapshot
   
   Which would print it like:
   
       mysnapshot       ks       tb2        1.16 KiB  21.47 KiB    
2023-03-02T13:19:42.140Z                
       mysnapshot       ks       tb         1.02 KiB  6.08 KiB     
2023-03-02T13:19:13.757Z                
       mysnapshot       system   local      107 bytes 6.98 KiB     
2023-03-02T13:19:13.757Z 
   
   So, with the primary key `(id, (keyspace, table))`, the advantage is that we 
would be able to visually see what all tables were snapshotted in that one 
logical snapshot based on the very similar timestamp. Here, we see, from the 
timestamp, that ks.tb and system.local were snapshotted "together".
   
   So from this perspective it is better if snapshot id is partition key.
   
   I would go so far to include timestamp into primary key as well: `(id, 
(keyspace, table, timestamp))`
   
   This way we would have them ordered too and it does not need to be specified 
when querying.
   
   
   If we made it like I suggested: (keyspace, table, snapshotid), we would lose 
the information, it migth be like:
   
       ks1 tb1 snapshot1 2023-03-02T13:19:42.140Z        
       ks4 tb4 snapshot1 2023-03-02T13:19:13.757Z 
       ks3 tb2 snapshot1 2023-03-02T13:19:13.757Z 
       ks3 tb3 snapshot1 2023-03-02T13:19:42.757Z 
   
   But here it is not so simple to see that "ks4.tb4" and "ks3.tb2" are forming 
one logical snapshot and then "ks1.tb1" and "ks3.tb3" another.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [cassandra] smiklosovic commented on a diff in pull request #2117: Add a virtual table to list snapshots for CASSANDRA-18102

Reply via email to