Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10375 )
Change subject: Kudu Backup/Restore Spark Jobs ...................................................................... Patch Set 17: (25 comments) Overall looks good, all minor feedback. http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto File java/kudu-backup/src/main/protobuf/backup.proto: http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@32 PS17, Line 32: nit: use 2-space indentation in this file for consistency with other .proto files in the code base http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@46 PS17, Line 46: default let's name this default_value since DEFAULT is a keyword in protobuf 2 and it might be a little confusing http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@52 PS17, Line 52: readible readable http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@63 PS17, Line 63: endoded encoded http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@65 PS17, Line 65: repeated ColumnValueMetadataPB lower_bounds = 1; : repeated ColumnValueMetadataPB upper_bounds = 2; Doc why these are repeated and what invariants we have (number of components in lower_bounds and upper_bounds equals number of columns involved in the range partition) http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@98 PS17, Line 98: // The starting point of a backup. A UNIX timestamp in milliseconds since the epoch. : int64 from_ms = 1; : // The end point of a backup. A UNIX timestamp in milliseconds since the epoch. : int64 to_ms = 2; What is the purpose of storing these? Maybe this would be better expressed as startTime and duration? http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@103 PS17, Line 103: name Can we name this table_name? http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@50 PS17, Line 50: TODO: Take parameter for the SaveMode nit: Add a period or punctuation to the end of all your comments per the C++ style guide @ https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar (I know this is not C++) and for consistency with the rest of the Kudu code base. Here and in the rest of this patch. http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66 PS17, Line 66: false small suggestion: /* overwrite= */ false instead of the trailing comment http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala: http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@37 PS17, Line 37: defaultFormat nit: Shouldn't these constants be in UpperCamelCase per https://docs.scala-lang.org/style/naming-conventions.html#constants-values-variable-and-methods ? http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@42 PS17, Line 42: // TODO: clean up usage output nit: punctuation in this file too http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@55 PS17, Line 55: timestampMs should we specify this as microseconds since that's the resolution we can accept internally anyway? http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala: http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@45 PS17, Line 45: // Set a hybrid time for the scan to ensure application consistency nit: comment punctuation here and elsewhere http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@46 PS17, Line 46: options.timestampMs this is an optional parameter, we should either handle the non-specified case or add a TODO for that http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@57 PS17, Line 57: .snapshotTimestampRaw(hybridTime) Do we store this? We need to record the chosen timestamp somewhere in order to support incremental backups http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@78 PS17, Line 78: delta I think the terminology we have been using is "incremental", because deltas already are a thing in Kudu http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@97 PS17, Line 97: and expects an Iterator[Row] in return and expects to return an Iterator[Row] http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@106 PS17, Line 106: How about: while (scanner.hasMoreRows && (currentIterator == null || !currentIterator.hasNext)) http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@107 PS17, Line 107: if (TaskContext.get().isInterrupted()) Why is this check inside the loop? Can we move this to the bottom of the hasNext method? http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@108 PS17, Line 108: Kudu task interrupted how about: "KuduBackup spark task interrupted", or something similar / more specific http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@115 PS17, Line 115: TODO here and elsewhere: when you write TODO put TODO(grant) or TODO(KUDU-1234) or something like that so it's easier to keep track of the action item related to the TODO, per https://google.github.io/styleguide/cppguide.html#TODO_Comments and consistency with the rest of the Kudu code base. http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@43 PS17, Line 43: ins't typo: isn't -- also add punctuation and TODO(...) to comments in this file too http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@43 PS17, Line 43: process s/process/processed/ http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala: http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala@69 PS17, Line 69: opt[String]("format") Why wouldn't this be stored in the metadata? http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala: http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala@99 PS17, Line 99: Partitions nit: s/Partitions/Partition/ -- To view, visit http://gerrit.cloudera.org:8080/10375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7 Gerrit-Change-Number: 10375 Gerrit-PatchSet: 17 Gerrit-Owner: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Tue, 19 Jun 2018 01:31:03 +0000 Gerrit-HasComments: Yes