Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10375 )
Change subject: WIP: Kudu Backup/Restore Spark Jobs ...................................................................... Patch Set 14: (23 comments) http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@9 PS14, Line 9: Adds a rough base implementation of Kudu backup and restore Spark jobs. There are many todo’s indicating gaps and more testing and details to be be finished. However, these base jobs work and are in a functional state that can be committed and iterated on as we build up and improve our backup functionality. Nit: please reformat the commit message so it adheres to our style guide's 80 chars per line. http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@9 PS14, Line 9: todo’s Nit: todos http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@13 PS14, Line 13: each tables In the "per table" directory, perhaps? It's a directory per table, right? http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@13 PS14, Line 13: . T Nit: as written, you should combine these two sentences: "in any spark compatible format, the defaults being HDFS and Parquet" http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@15 PS14, Line 15: The restore job can read the data and metadata generated. Create “restore” tables with a matching schema and reload the data. The second sentence is a fragment; consider joining it to the first: "...metadata generated, and create "restore" tables..." http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/protobuf/backup.proto File java/kudu-backup/src/main/protobuf/backup.proto: PS12: > Do you have a suggestion what to include so that the doc enhances the descr The docs should explain to what logical Kudu concept each field is mapped, and how they all work together. http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/protobuf/backup.proto@28 PS12, Line 28: > Done Can you move TableMetadataPB to the bottom? http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto File java/kudu-backup/src/main/protobuf/backup.proto: http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@22 PS14, Line 22: syntax = "proto3"; The default in proto3 is for each field to be optional, right? http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@40 PS14, Line 40: differententiate Nit: differentiate. http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@58 PS14, Line 58: string column = 1; Is this a column name? An index? An ID? http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@59 PS14, Line 59: string value = 2; Is this value encoded? Decoded? http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@68 PS14, Line 68: repeated string columns = 1; Column names? Indexes? IDs? http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@73 PS14, Line 73: repeated string columns = 1; Same. http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66 PS14, Line 66: prevent's Nit: prevents http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala: http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@41 PS12, Line 41: > Should explain what this means w.r.t. the ancient history watermark. Still to do? http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala: http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@38 PS12, Line 38: sc > KuduContext calls it `sc` too. What I meant was, the KuduContext argument to this function is named 'kuduContext', so perhaps the SparkContext variable should be named 'sparkContext' for symmetry. http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@87 PS12, Line 87: > Nit: indentation. Still to do. http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@91 PS12, Line 91: > Not really understanding the purpose of this wrapper. Doc? Still to do. http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@97 PS12, Line 97: override def hasNext: Boolean = { > Nit: switch this around so the overall condition is more clear. Still to do. http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@43 PS14, Line 43: ins't Nit: isn't http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala: http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala@28 PS12, Line 28: kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName, > How might this be formatted? Could you provide more guidance? Done http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala File java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@47 PS12, Line 47: > It is a constant in KuduRestoreOptions.defaultTableSuffix OK, can you reference that constant here then? http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-client/src/main/java/org/apache/kudu/Type.java File java/kudu-client/src/main/java/org/apache/kudu/Type.java: http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-client/src/main/java/org/apache/kudu/Type.java@194 PS14, Line 194: { Nit: should add a space before this. In the below condition too. -- To view, visit http://gerrit.cloudera.org:8080/10375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7 Gerrit-Change-Number: 10375 Gerrit-PatchSet: 14 Gerrit-Owner: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Wed, 13 Jun 2018 18:52:15 +0000 Gerrit-HasComments: Yes