Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: WIP: [backup] Add incremental backup/restore support
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto@118
PS2, Line 118:   int32 version = 9;
Skipped 8?

Since this is a new message type, I'd actually make this the first field and 
move it to the top, to increase its visibility.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@163
PS2, Line 163:  * Vertex class to represent vertices in the backup graph.
Curious why you went with the noun "vertex"; isn't "node" the more commonly 
used term in CS?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35
PS2, Line 35:  * */
Nit: Javadoc not quite formatted properly?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@58
PS2, Line 58:       // find the previous backup and use the `to_ms` metadata as 
the `from_ms` time for this backup.
Nit: too long?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@144
PS2, Line 144:     val columnCount = if (incremental) fieldCount - 1 else 
fieldCount
This function is called per-row, right? I wonder if it'd be more performant to 
have a "full" version and an "incremental" version rather than checking "if 
incremental" with each call.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@39
PS2, Line 39:  * */
Formatting?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@54
PS2, Line 54:     // Disable file splitting to keep one Spark task per Kudu 
partition
            :     // and reduce compaction when restoring.
            :     session.conf.set(ParquetInputFormat.SPLIT_FILES, false)
Did you test what effect this has? If it may have a drastic impact on 
performance, may want to separate it into its own patch so it's not buried in 
this large incremental backup patch.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:               // Get the operation type based on the change type 
column.
             :               // This will always be the last column in the row.
             :               val rowActionValue = row.getByte(row.length - 1)
Does Scala allow returning tuples like in Python? Seems like it'd be more 
elegant to return the row action separately from the row itself. Maybe more 
efficient too; you wouldn't need to convert it from an enum into a byte and 
back.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37:     tables: Seq[String],
            :     rootPath: String,
            :     kuduMasterAddresses: String = 
InetAddress.getLocalHost.getCanonicalHostName,
Why do these three need to be listed here and in RestoreOptions if both extend 
CommonOptions?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@93
PS2, Line 93:         // TODO: Document the limitations based on cluster 
configuration (ex: ancient history watermark).
Nit: too long?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala@59
PS2, Line 59: class SessionIO(val session: SparkSession, options: 
CommonOptions) {
Could you doc at least the interesting methods here?



--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Mon, 15 Apr 2019 04:21:22 +0000
Gerrit-HasComments: Yes

Reply via email to