Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 2:

(7 comments)

Nice work. One thing about this that makes me nervous in general is that it's 
metadata-only logic and doesn't do any verification of existence or validity of 
the data files. I wonder if there is something we should do along those lines 
for data protection reasons.

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
File 
java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala@107
PS2, Line 107:     fs.delete(backupPath(metadata.getTableId, 
metadata.getTableName, metadata.getToMs), true)
What is our crash consistency story here?


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
File 
java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@90
PS2, Line 90:     // 1. The table name matches the provided names (does not 
apply if no names were specified).
This will match a table that had the specified name at any time in its history, 
not just the latest name it had. I wonder if we should restrict it to only the 
latest name a table with a given id had? Either way, we should doc the precise 
behavior.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@92
PS2, Line 92:     // 3. The backup is not on the current restore path.
Can we add a TODO here for dropped tables? Maybe something like:

TODO(KUDU-2827): Consider dropped tables eligible for deletion once they reach 
a certain age.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@102
PS2, Line 102: toInstant
nit: lastBackupInstant might be a better name


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
File 
java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@51
PS2, Line 51: epochMs
nit: rename to something like epochMillisBefore()


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@62
PS2, Line 62: ts
nit: rename ts to ages maybe?


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118
PS2, Line 118:     // Finally, add a third path which is not the restore path 
but which has backups that are old
Can we add a test case where there are multiple toMs / fromMs alignments? It 
would be nice to prune an old "twin" daughter path but I'm not sure what do to 
about multiple "grandparent" paths. I guess we could / should arbitrarily kill 
one of the ancestor paths?



--
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Comment-Date: Sat, 25 May 2019 01:05:02 +0000
Gerrit-HasComments: Yes

Reply via email to