Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )
Change subject: [backup] Add initial incremental backup/restore support ...................................................................... Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto File java/kudu-backup/src/main/protobuf/backup.proto: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto@118 PS2, Line 118: PartitionMetadataPB partitions = 8; > Skipped 8? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@163 PS2, Line 163: * Node class to represent nodes in the backup graph. > Curious why you went with the noun "vertex"; isn't "node" the more commonly No specific reason, will change to Node. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35 PS2, Line 35: */ > Nit: Javadoc not quite formatted properly? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@58 PS2, Line 58: // use the `to_ms` metadata as the `from_ms` time for this backup. > Nit: too long? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@144 PS2, Line 144: val columnCount = if (incremental) fieldCount - 1 else fieldCount > This function is called per-row, right? I wonder if it'd be more performant We could assume the column count is static across the life of an RDD and memoize this calculation, but the existing implementation didn't do that, so I kept the behavior the same. The additional if check should be cheap relative to the other work in the method. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@39 PS2, Line 39: */ > Formatting? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@54 PS2, Line 54: // TODO: Make parallel so each table isn't processed serially. : options.tables.foreach { tableName => : val graph = io.readBackupGraph(tableName).filterByTim > Did you test what effect this has? If it may have a drastic impact on perfo This wasn't meant to be included, I was experimenting with it. I will remove it. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100 PS2, Line 100: // Generate an operation based on the row action. : val operation = rowAction match { : case RowAction.UPSERT => table.newUpsert() > Does Scala allow returning tuples like in Python? Yes it does > Seems like it'd be more elegant to return the row action separately from the > row itself. The Row abstraction is a result of the Spark framework. It expects Dataframes to work with Rows and is less flexible than the raw RDD API. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37 PS2, Line 37: tables: Seq[String], : rootPath: String, : kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName, > Why do these three need to be listed here and in RestoreOptions if both ext The CommonOptions trait is specifying that BackupOptions/RestoreOptions needs to implement a value function for tables, rootPath, and kuduMasterAddresses. But it doesn't specify "how". BackupOptions/RestoreOptions implement those functions via a constructor. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@93 PS2, Line 93: // TODO: Document the limitations based on cluster configuration > Nit: too long? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala@59 PS2, Line 59: */ > Could you doc at least the interesting methods here? Done -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Mon, 15 Apr 2019 18:26:47 +0000 Gerrit-HasComments: Yes
