Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13405 )
Change subject: [backup] Add a tool to clean up old backups ...................................................................... Patch Set 2: (7 comments) Nice work. One thing about this that makes me nervous in general is that it's metadata-only logic and doesn't do any verification of existence or validity of the data files. I wonder if there is something we should do along those lines for data protection reasons. http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala File java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala: http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala@107 PS2, Line 107: fs.delete(backupPath(metadata.getTableId, metadata.getTableName, metadata.getToMs), true) What is our crash consistency story here? http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala File java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala: http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@90 PS2, Line 90: // 1. The table name matches the provided names (does not apply if no names were specified). This will match a table that had the specified name at any time in its history, not just the latest name it had. I wonder if we should restrict it to only the latest name a table with a given id had? Either way, we should doc the precise behavior. http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@92 PS2, Line 92: // 3. The backup is not on the current restore path. Can we add a TODO here for dropped tables? Maybe something like: TODO(KUDU-2827): Consider dropped tables eligible for deletion once they reach a certain age. http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@102 PS2, Line 102: toInstant nit: lastBackupInstant might be a better name http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala File java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala: http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@51 PS2, Line 51: epochMs nit: rename to something like epochMillisBefore() http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@62 PS2, Line 62: ts nit: rename ts to ages maybe? http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118 PS2, Line 118: // Finally, add a third path which is not the restore path but which has backups that are old Can we add a test case where there are multiple toMs / fromMs alignments? It would be nice to prune an old "twin" daughter path but I'm not sure what do to about multiple "grandparent" paths. I guess we could / should arbitrarily kill one of the ancestor paths? -- To view, visit http://gerrit.cloudera.org:8080/13405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39 Gerrit-Change-Number: 13405 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Sat, 25 May 2019 01:05:02 +0000 Gerrit-HasComments: Yes
