Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10941 )
Change subject: WIP Add error handling to KuduBackup DONT_BUILD KuduBackup will attempt to backup every table. If any one table fails the entire job will fail. ...................................................................... Patch Set 2: (19 comments) http://gerrit.cloudera.org:8080/#/c/10941/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10941/2//COMMIT_MSG@7 PS2, Line 7: WIP Add error handling to KuduBackup Nit: This subject is really long. Shows up as 4 lines on gerrit. http://gerrit.cloudera.org:8080/#/c/10941/2//COMMIT_MSG@21 PS2, Line 21: - if 'overwrite' flag is specified, overwrite the 'metadata' I think we can keep it simple and let users manually delete files they don't want for now. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupStatus.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupStatus.scala: http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupStatus.scala@5 PS2, Line 5: class BackupStatus(val tableName: String, val isSuccess: Boolean) { Since their isn't a ton of custom functionality perhaps using Scala's built in Try directly is good enough and more lightweight. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@39 PS2, Line 39: def main(args: Array[String]): Unit = { Is there a reason you moved main to the top of the class? I think we should try to keep the layout between KuduBackup and KuduRestore similar since the processes are so similar. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@56 PS2, Line 56: throw new Exception( Nit. Use a more specific exception. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@57 PS2, Line 57: s"Failed backing up at least one table. See previous error messages for details. Failed tables: ${failures.mkString(",")}") Instead of "at least one". Include the count that failed out of the total submitted. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@70 PS2, Line 70: // @TODO: do incremental backup goes here I am not sure this is how we will handle incremental. I am thinking we would pass an argument requesting incremental vs full. In the case of a full backup I think we can rely on Sparks SaveMode semantics to detect existing data. That covers more cases because any data existing, not just the metadata thats successful, we should likely fail on. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@74 PS2, Line 74: context <- Try(new KuduContext(options.kuduMasterAddresses, session.sparkContext)); Why did this context get move here? I think it can be shared for all uses in the application. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@87 PS2, Line 87: case Success(_) => log.error(s"Removed partial backup directory: $tablePath") Maybe info level http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@96 PS2, Line 96: private def writeMetadata(options: KuduBackupOptions, session: SparkSession, tablePath: Path, table: KuduTable) = { Nit: add explicit return type Could this method just be merged into writeTableMetadata? http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@127 PS2, Line 127: val conf = session.sparkContext.hadoopConfiguration Now that we do this 3 line preamble to get the filesystem and hpath a few times, maybe we should move these methods to a utility? http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@135 PS2, Line 135: private def getTableMetadata(options: KuduBackupOptions, session: SparkSession, tablePath: Path, table: KuduTable) = Try { What is this for? It looks like it writes the metadata. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@66 PS2, Line 66: BackupSuccess(t) Should the be RestoreSuccess/RestoreFailure? Since their isn't a ton of custom functionality perhaps using Scala's built in Try directly is good enough and more lightweight. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@79 PS2, Line 79: def readTableMetadata(path: Path, session: SparkSession): Try[TableMetadataPB] = Try { Nit: Explicit return type. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@91 PS2, Line 91: private def getTable(metadata: TableMetadataPB, options: KuduRestoreOptions, context: KuduContext): Try[KuduTable] = Try { Nit: Explicit return type Maybe getOrCreateTable to be more descriptive? http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala File java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@85 PS2, Line 85: val backupRows = kuduContext.kuduRDD(ss.sparkContext, s"$tableName").collect nit: accidental indent http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@140 PS2, Line 140: test("Backup Failure When No Connection to Kudu") { This could fail because the path is not found. Need to validate why it failed. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@147 PS2, Line 147: test("Restore Throws Exception When Path Not Found") { Need to fill this out. http://gerrit.cloudera.org:8080/#/c/10941/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@151 PS2, Line 151: test("Restore Throws Exception When Table Already Exists") { Need to fill this out. -- To view, visit http://gerrit.cloudera.org:8080/10941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I104fd604e12b70fcff9beace71ed4ef96b54d691 Gerrit-Change-Number: 10941 Gerrit-PatchSet: 2 Gerrit-Owner: Tony Foerster <anthonymfoers...@gmail.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tony Foerster <anthonymfoers...@gmail.com> Gerrit-Comment-Date: Tue, 17 Jul 2018 20:11:34 +0000 Gerrit-HasComments: Yes