Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
File 
java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala@107
PS2, Line 107:     fs.delete(backupPath(metadata.getTableId, 
metadata.getTableName, metadata.getToMs), true)
> What is our crash consistency story here?
It depends on the implementation of the filesystem the backups are stored on. 
It deletes the directory recursively, so it's likely it could be partway done 
after a crash. This would be ok as long as the backup metadata is still around. 
If the directory is still there but the metadata is gone, then that would cause 
an error trying to read the backup that's supposed to be in that directory.

To handle that situation, we ought to detect and clean up directories with 
unusable backups, which basically means directories with no metadata. The 
present backup code doesn't handle that situation and it would throw.

I think that addition is a follow-up to this base tool. I added a todo where 
the tool calls this method.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
File 
java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@90
PS2, Line 90:     // 1. The table name matches the provided names (does not 
apply if no names were specified).
> This will match a table that had the specified name at any time in its hist
You're right. I see the trick that the restore job uses to make sure it's only 
restoring tables whose current name in in the list of specified tables. I've 
applied the same trick here.

Don't worry too much about it, though, because it will change soon. First, when 
single table failures can be non-fatal, and, second, when the table specifiers 
can be patterns.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@92
PS2, Line 92:     // 3. The backup is not on the current restore path.
> Can we add a TODO here for dropped tables? Maybe something like:
Done


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@102
PS2, Line 102: toInstant
> nit: lastBackupInstant might be a better name
Done


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
File 
java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@51
PS2, Line 51: epochMs
> nit: rename to something like epochMillisBefore()
Done


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@62
PS2, Line 62: ts
> nit: rename ts to ages maybe?
Done


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118
PS2, Line 118:     // Finally, add a third path which is not the restore path 
but which has backups that are old
> Can we add a test case where there are multiple toMs / fromMs alignments? I
Well, we actually sort of can't because backup dirs are named using only the 
toMs, so you can't actually have a backup state on the fs that has multiple 
toMs alignments. Multiple fromMs alignments happen anytime there's multiple 
full backups, so it's tested here.

Also, if the backup graph handles these cases correctly (returns the right 
backup paths) then the tool will clean them correctly based on the two rules:
1. Delete backups on paths that are expired (head of path is older than 
expiration age).
2. Don't delete backups on the restore path regardless of whatever other path 
the backup is on.

The backup graph already has tests with situations like that too.



--
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Tue, 28 May 2019 16:52:50 +0000
Gerrit-HasComments: Yes

Reply via email to