Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
......................................................................


Patch Set 17:

(25 comments)

Overall looks good, all minor feedback.

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@32
PS17, Line 32:
nit: use 2-space indentation in this file for consistency with other .proto 
files in the code base


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@46
PS17, Line 46: default
let's name this default_value since DEFAULT is a keyword in protobuf 2 and it 
might be a little confusing


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@52
PS17, Line 52: readible
readable


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@63
PS17, Line 63: endoded
encoded


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@65
PS17, Line 65:     repeated ColumnValueMetadataPB lower_bounds = 1;
             :     repeated ColumnValueMetadataPB upper_bounds = 2;
Doc why these are repeated and what invariants we have (number of components in 
lower_bounds and upper_bounds equals number of columns involved in the range 
partition)


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@98
PS17, Line 98:     // The starting point of a backup. A UNIX timestamp in 
milliseconds since the epoch.
             :     int64 from_ms = 1;
             :     // The end point of a backup. A UNIX timestamp in 
milliseconds since the epoch.
             :     int64 to_ms = 2;
What is the purpose of storing these?

Maybe this would be better expressed as startTime and duration?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@103
PS17, Line 103: name
Can we name this table_name?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@50
PS17, Line 50: TODO: Take parameter for the SaveMode
nit: Add a period or punctuation to the end of all your comments per the C++ 
style guide @ 
https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar
 (I know this is not C++) and for consistency with the rest of the Kudu code 
base.

Here and in the rest of this patch.


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66
PS17, Line 66: false
small suggestion: /* overwrite= */ false

instead of the trailing comment


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
File 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@37
PS17, Line 37: defaultFormat
nit: Shouldn't these constants be in UpperCamelCase per 
https://docs.scala-lang.org/style/naming-conventions.html#constants-values-variable-and-methods
 ?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@42
PS17, Line 42:   // TODO: clean up usage output
nit: punctuation in this file too


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@55
PS17, Line 55: timestampMs
should we specify this as microseconds since that's the resolution we can 
accept internally anyway?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@45
PS17, Line 45:     // Set a hybrid time for the scan to ensure application 
consistency
nit: comment punctuation here and elsewhere


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@46
PS17, Line 46: options.timestampMs
this is an optional parameter, we should either handle the non-specified case 
or add a TODO for that


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@57
PS17, Line 57: .snapshotTimestampRaw(hybridTime)
Do we store this? We need to record the chosen timestamp somewhere in order to 
support incremental backups


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@78
PS17, Line 78: delta
I think the terminology we have been using is "incremental", because deltas 
already are a thing in Kudu


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@97
PS17, Line 97: and expects an Iterator[Row] in return
and expects to return an Iterator[Row]


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@106
PS17, Line 106:
How about:

  while (scanner.hasMoreRows && (currentIterator == null || 
!currentIterator.hasNext))


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@107
PS17, Line 107: if (TaskContext.get().isInterrupted())
Why is this check inside the loop? Can we move this to the bottom of the 
hasNext method?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@108
PS17, Line 108: Kudu task interrupted
how about: "KuduBackup spark task interrupted", or something similar / more 
specific


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@115
PS17, Line 115: TODO
here and elsewhere: when you write TODO put TODO(grant) or TODO(KUDU-1234) or 
something like that so it's easier to keep track of the action item related to 
the TODO, per https://google.github.io/styleguide/cppguide.html#TODO_Comments 
and consistency with the rest of the Kudu code base.


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@43
PS17, Line 43: ins't
typo: isn't -- also add punctuation and TODO(...) to comments in this file too


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@43
PS17, Line 43: process
s/process/processed/


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
File 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala@69
PS17, Line 69:     opt[String]("format")
Why wouldn't this be stored in the metadata?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala@99
PS17, Line 99: Partitions
nit: s/Partitions/Partition/



--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 17
Gerrit-Owner: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Tue, 19 Jun 2018 01:31:03 +0000
Gerrit-HasComments: Yes

Reply via email to