[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-25 Thread Grant Henke (Code Review)
Grant Henke has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..

Kudu Backup/Restore Spark Jobs

Adds a rough base implementation of Kudu backup and restore
Spark jobs. There are many todos indicating gaps and more testing
and details to be be finished.  However, these base jobs work and are
in a functional state that can be committed and iterated on as we
build up and improve our backup functionality.

These jobs, as annotated, should be considered private, unstable,
and experimental.

The backup job can output one to many tables data to any spark
compatible path in any spark compatible format, the defaults being
HDFS and Parquet. Each table’s data is written in a subdirectory of
the provided path. The subdirectory’s name is the url encoded table
name. Additionally in each table’s directory a json metadata file is
output with the metadata needed to recreate the table that was
exported when restoring.

The restore job can read the data and metadata generated and create
“restore” tables with a matching schema and reload the data.

The job arguments are a work in progress and will likely be enhanced
and simplified as we find what is useful and what isn’t through
performance and functional testing. More documentation will be
generated when the jobs are ready for general use.

Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Reviewed-on: http://gerrit.cloudera.org:8080/10375
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 
---
M java/gradle/dependencies.gradle
A java/kudu-backup/build.gradle
A java/kudu-backup/pom.xml
A java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
A 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/resources/log4j.properties
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Type.java
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M java/pom.xml
M java/settings.gradle
18 files changed, 1,594 insertions(+), 7 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Mike Percy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 21
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-25 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 20: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 20
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 25 Jun 2018 19:31:48 +
Gerrit-HasComments: No


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-25 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10375

to look at the new patch set (#20).

Change subject: Kudu Backup/Restore Spark Jobs
..

Kudu Backup/Restore Spark Jobs

Adds a rough base implementation of Kudu backup and restore
Spark jobs. There are many todos indicating gaps and more testing
and details to be be finished.  However, these base jobs work and are
in a functional state that can be committed and iterated on as we
build up and improve our backup functionality.

These jobs, as annotated, should be considered private, unstable,
and experimental.

The backup job can output one to many tables data to any spark
compatible path in any spark compatible format, the defaults being
HDFS and Parquet. Each table’s data is written in a subdirectory of
the provided path. The subdirectory’s name is the url encoded table
name. Additionally in each table’s directory a json metadata file is
output with the metadata needed to recreate the table that was
exported when restoring.

The restore job can read the data and metadata generated and create
“restore” tables with a matching schema and reload the data.

The job arguments are a work in progress and will likely be enhanced
and simplified as we find what is useful and what isn’t through
performance and functional testing. More documentation will be
generated when the jobs are ready for general use.

Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
---
M java/gradle/dependencies.gradle
A java/kudu-backup/build.gradle
A java/kudu-backup/pom.xml
A java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
A 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/resources/log4j.properties
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Type.java
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M java/pom.xml
M java/settings.gradle
18 files changed, 1,594 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/10375/20
--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 20
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-20 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 19:

(7 comments)

Looks good, just nits.

http://gerrit.cloudera.org:8080/#/c/10375/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10375/19//COMMIT_MSG@22
PS19, Line 22: tables
Nit: table's


http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto@56
PS19, Line 56: encoded human-readable
Aren't these contradictory? I'd expect an encoded value to be human unreadable, 
vs. a value that had been decoded using the table's schema. BTW here "encoded" 
refers to Kudu-based encoding, not something like UTF-8.


http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto@109
PS19, Line 109:   // The number of replicas this table has.
Nit: technically, it's the number of replicas of each tablet in the table. Or 
you can call it the "replication factor" of the table.


http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto@111
PS19, Line 111:   // The metadata for the tables columns.
Nit: table's


http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto@113
PS19, Line 113: tables
Nit: table's


http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
File 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala:

http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@37
PS19, Line 37:   val DefaultFormat: String = "parquet"
 :   val DefaultScanBatchSize: Int = 1024*1024*20 // 20 MiB
 :   val DefaultScanRequestTimeout: Long = 
AsyncKuduClient.DEFAULT_OPERATION_TIMEOUT_MS // 30 seconds
 :   val DefaultScanPrefetching: Boolean = false // TODO: Add a 
test per KUDU-1260 and enable by default?
If these are equivalent to "public static final ..." in Java, should they be 
UPPER_CASE too?


http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
File 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala:

http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala@34
PS19, Line 34:   val DefaultTableSuffix: String = "-restore"
 :   val DefaultCreateTables: Boolean = true
Same question.



--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 19
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 20 Jun 2018 21:16:26 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-20 Thread Grant Henke (Code Review)
Grant Henke has removed a vote on this change.

Change subject: Kudu Backup/Restore Spark Jobs
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 19
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-20 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 19: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 19
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 20 Jun 2018 14:25:27 +
Gerrit-HasComments: No


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-19 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10375

to look at the new patch set (#19).

Change subject: Kudu Backup/Restore Spark Jobs
..

Kudu Backup/Restore Spark Jobs

Adds a rough base implementation of Kudu backup and restore
Spark jobs. There are many todos indicating gaps and more testing
and details to be be finished.  However, these base jobs work and are
in a functional state that can be committed and iterated on as we
build up and improve our backup functionality.

These jobs, as annotated, should be considered private, unstable,
and experimental.

The backup job can output one to many tables data to any spark
compatible path in any spark compatible format, the defaults being
HDFS and Parquet. Each table’s data is written in a subdirectory of
the provided path. The subdirectory’s name is the url encoded table
name. Additionally in each tables directory a json metadata file is
output with the metadata needed to recreate the table that was
exported when restoring.

The restore job can read the data and metadata generated and create
“restore” tables with a matching schema and reload the data.

The job arguments are a work in progress and will likely be enhanced
and simplified as we find what is useful and what isn’t through
performance and functional testing. More documentation will be
generated when the jobs are ready for general use.

Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
---
M java/gradle/dependencies.gradle
A java/kudu-backup/build.gradle
A java/kudu-backup/pom.xml
A java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
A 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/resources/log4j.properties
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Type.java
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M java/pom.xml
M java/settings.gradle
18 files changed, 1,594 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/10375/19
--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 19
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-19 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 18:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10375/18/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/10375/18/java/kudu-backup/src/main/protobuf/backup.proto@65
PS18, Line 65: The number of values must be <= the number of columns
The number of values equals the number of columns in the range partition key


http://gerrit.cloudera.org:8080/#/c/10375/18/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
File 
java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/10375/18/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@41
PS18, Line 41:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/10375/18/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@41
PS18, Line 41:
extra space


http://gerrit.cloudera.org:8080/#/c/10375/18/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@113
PS18, Line 113:   // TODO: Move to a PartitionSchema equals/equivalent method
Missing punctuation here and in other TODOs in this file


http://gerrit.cloudera.org:8080/#/c/10375/18/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@126
PS18, Line 126: Has
hash


http://gerrit.cloudera.org:8080/#/c/10375/18/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@281
PS18, Line 281: row
does this do anything here?



--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 18
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 19 Jun 2018 21:49:07 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-19 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 17:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@32
PS17, Line 32:
> nit: use 2-space indentation in this file for consistency with other .proto
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@46
PS17, Line 46: default
> let's name this default_value since DEFAULT is a keyword in protobuf 2 and
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@52
PS17, Line 52: readible
> readable
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@63
PS17, Line 63: endoded
> encoded
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@65
PS17, Line 65: repeated ColumnValueMetadataPB lower_bounds = 1;
 : repeated ColumnValueMetadataPB upper_bounds = 2;
> Doc why these are repeated and what invariants we have (number of component
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@98
PS17, Line 98: // The starting point of a backup. A UNIX timestamp in 
milliseconds since the epoch.
 : int64 from_ms = 1;
 : // The end point of a backup. A UNIX timestamp in 
milliseconds since the epoch.
 : int64 to_ms = 2;
> What is the purpose of storing these?
This isn't the start time and end time of the job. It the time used to dictate 
the snapshot scan. There is a comment where this is used talking about how 
from_ms is always 0 until we introduce incremental backups.

We store these values so that we can know where a backup left off and properly 
set the time for a snapshot scan in follow up incremental jobs.


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@103
PS17, Line 103: name
> Can we name this table_name?
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@50
PS17, Line 50: TODO: Take parameter for the SaveMode
> nit: Add a period or punctuation to the end of all your comments per the C+
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66
PS17, Line 66: false
> small suggestion: /* overwrite= */ false
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
File 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@37
PS17, Line 37: defaultFormat
> nit: Shouldn't these constants be in UpperCamelCase per https://docs.scala-
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@42
PS17, Line 42:   // TODO: clean up usage output
> nit: punctuation in this file too
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@55
PS17, Line 55: timestampMs
> should we specify this as microseconds since that's the resolution we can a
Anything more precise than milliseconds seamed over the top for command line 
interaction. I suspect most users would just be converting from seconds or 
milliseconds.


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@45
PS17, Line 45: // Set a hybrid time for the scan to ensure application 
consistency
> nit: comment punctuation here and elsewhere
Done


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@46
PS17, Line 46: options.timestampMs
> this is an optional parameter, we should either handle the non-specified ca
It's defaulted to System.currentTimeMillis() in KuduBackupOptions.


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@57
PS17, Line 57: .snapshotTimestampRaw(hybridTime)
> Do we store this? We 

[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-19 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10375

to look at the new patch set (#18).

Change subject: Kudu Backup/Restore Spark Jobs
..

Kudu Backup/Restore Spark Jobs

Adds a rough base implementation of Kudu backup and restore
Spark jobs. There are many todos indicating gaps and more testing
and details to be be finished.  However, these base jobs work and are
in a functional state that can be committed and iterated on as we
build up and improve our backup functionality.

These jobs, as annotated, should be considered private, unstable,
and experimental.

The backup job can output one to many tables data to any spark
compatible path in any spark compatible format, the defaults being
HDFS and Parquet. Each table’s data is written in a subdirectory of
the provided path. The subdirectory’s name is the url encoded table
name. Additionally in each tables directory a json metadata file is
output with the metadata needed to recreate the table that was
exported when restoring.

The restore job can read the data and metadata generated and create
“restore” tables with a matching schema and reload the data.

The job arguments are a work in progress and will likely be enhanced
and simplified as we find what is useful and what isn’t through
performance and functional testing. More documentation will be
generated when the jobs are ready for general use.

Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
---
M java/gradle/dependencies.gradle
A java/kudu-backup/build.gradle
A java/kudu-backup/pom.xml
A java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
A 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/resources/log4j.properties
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Type.java
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M java/pom.xml
M java/settings.gradle
18 files changed, 1,594 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/10375/18
--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 18
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-18 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 17:

(25 comments)

Overall looks good, all minor feedback.

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@32
PS17, Line 32:
nit: use 2-space indentation in this file for consistency with other .proto 
files in the code base


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@46
PS17, Line 46: default
let's name this default_value since DEFAULT is a keyword in protobuf 2 and it 
might be a little confusing


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@52
PS17, Line 52: readible
readable


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@63
PS17, Line 63: endoded
encoded


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@65
PS17, Line 65: repeated ColumnValueMetadataPB lower_bounds = 1;
 : repeated ColumnValueMetadataPB upper_bounds = 2;
Doc why these are repeated and what invariants we have (number of components in 
lower_bounds and upper_bounds equals number of columns involved in the range 
partition)


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@98
PS17, Line 98: // The starting point of a backup. A UNIX timestamp in 
milliseconds since the epoch.
 : int64 from_ms = 1;
 : // The end point of a backup. A UNIX timestamp in 
milliseconds since the epoch.
 : int64 to_ms = 2;
What is the purpose of storing these?

Maybe this would be better expressed as startTime and duration?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/protobuf/backup.proto@103
PS17, Line 103: name
Can we name this table_name?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@50
PS17, Line 50: TODO: Take parameter for the SaveMode
nit: Add a period or punctuation to the end of all your comments per the C++ 
style guide @ 
https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar
 (I know this is not C++) and for consistency with the rest of the Kudu code 
base.

Here and in the rest of this patch.


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66
PS17, Line 66: false
small suggestion: /* overwrite= */ false

instead of the trailing comment


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
File 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@37
PS17, Line 37: defaultFormat
nit: Shouldn't these constants be in UpperCamelCase per 
https://docs.scala-lang.org/style/naming-conventions.html#constants-values-variable-and-methods
 ?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@42
PS17, Line 42:   // TODO: clean up usage output
nit: punctuation in this file too


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@55
PS17, Line 55: timestampMs
should we specify this as microseconds since that's the resolution we can 
accept internally anyway?


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@45
PS17, Line 45: // Set a hybrid time for the scan to ensure application 
consistency
nit: comment punctuation here and elsewhere


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@46
PS17, Line 46: options.timestampMs
this is an optional parameter, we should either handle the non-specified case 
or add a TODO for that


http://gerrit.cloudera.org:8080/#/c/10375/17/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@57
PS17, Line 57: .snapshotTimestampRaw(hybridTime)
Do we store this? We need to record the chosen timestamp 

[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-18 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10375/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10375/16//COMMIT_MSG@22
PS16, Line 22: Additionally in the each tables directory
> This still doesn't read correctly; what is "the each tables directory"?
Done


http://gerrit.cloudera.org:8080/#/c/10375/16/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

PS16:
> Just remembered that, stylistically, we use underscore_delimited_names for
Done


http://gerrit.cloudera.org:8080/#/c/10375/16/java/kudu-backup/src/main/protobuf/backup.proto@77
PS16, Line 77: int64 fromMs = 1;
 : int64 toMs = 2;
> What do these mean? Are they quantities of msec from the UNIX epoch? Please
Done


http://gerrit.cloudera.org:8080/#/c/10375/16/java/kudu-backup/src/main/protobuf/backup.proto@80
PS16, Line 80: int32 replicas = 4;
> Nit: would prefer num_replicas.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 16
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 18 Jun 2018 17:18:02 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-18 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10375

to look at the new patch set (#17).

Change subject: Kudu Backup/Restore Spark Jobs
..

Kudu Backup/Restore Spark Jobs

Adds a rough base implementation of Kudu backup and restore
Spark jobs. There are many todos indicating gaps and more testing
and details to be be finished.  However, these base jobs work and are
in a functional state that can be committed and iterated on as we
build up and improve our backup functionality.

These jobs, as annotated, should be considered private, unstable,
and experimental.

The backup job can output one to many tables data to any spark
compatible path in any spark compatible format, the defaults being
HDFS and Parquet. Each table’s data is written in a subdirectory of
the provided path. The subdirectory’s name is the url encoded table
name. Additionally in each tables directory a json metadata file is
output with the metadata needed to recreate the table that was
exported when restoring.

The restore job can read the data and metadata generated and create
“restore” tables with a matching schema and reload the data.

The job arguments are a work in progress and will likely be enhanced
and simplified as we find what is useful and what isn’t through
performance and functional testing. More documentation will be
generated when the jobs are ready for general use.

Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
---
M java/gradle/dependencies.gradle
A java/kudu-backup/build.gradle
A java/kudu-backup/pom.xml
A java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
A 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/resources/log4j.properties
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Type.java
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M java/pom.xml
M java/settings.gradle
18 files changed, 1,595 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/10375/17
--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 17
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-13 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10375/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10375/16//COMMIT_MSG@22
PS16, Line 22: Additionally in the each tables directory
This still doesn't read correctly; what is "the each tables directory"?


http://gerrit.cloudera.org:8080/#/c/10375/16/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

PS16:
Just remembered that, stylistically, we use underscore_delimited_names for 
proto fields, not camelCase. See src/kudu/common/common.proto for an example.


http://gerrit.cloudera.org:8080/#/c/10375/16/java/kudu-backup/src/main/protobuf/backup.proto@77
PS16, Line 77: int64 fromMs = 1;
 : int64 toMs = 2;
What do these mean? Are they quantities of msec from the UNIX epoch? Please doc 
this, and please doc the rest of the messages/fields as I suggested earlier.


http://gerrit.cloudera.org:8080/#/c/10375/16/java/kudu-backup/src/main/protobuf/backup.proto@80
PS16, Line 80: int32 replicas = 4;
Nit: would prefer num_replicas.



--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 16
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 13 Jun 2018 20:33:09 +
Gerrit-HasComments: Yes


[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-13 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10375 )

Change subject: Kudu Backup/Restore Spark Jobs
..


Patch Set 16:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@9
PS14, Line 9: Adds a rough base implementation of Kudu backup and restore
> Nit: please reformat the commit message so it adheres to our style guide's
Done


http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@9
PS14, Line 9:
> Nit: todos
Done


http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@13
PS14, Line 13:
> Nit: as written, you should combine these two sentences: "in any spark comp
Done


http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@13
PS14, Line 13:
> In the "per table" directory, perhaps? It's a directory per table, right?
Done


http://gerrit.cloudera.org:8080/#/c/10375/14//COMMIT_MSG@15
PS14, Line 15: These jobs, as annotated, should be considered private, unstable,
> The second sentence is a fragment; consider joining it to the first: "...me
Done


http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/protobuf/backup.proto@28
PS12, Line 28:
> Can you move TableMetadataPB to the bottom?
Done


http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@22
PS14, Line 22: syntax = "proto3";
> The default in proto3 is for each field to be optional, right?
Explicit "optional" keyword are disallowed in proto3 syntax, as fields are
optional by default; required fields are no longer supported.

They also removed field presence logic for primitive value fields and default 
values. This is why I use StringValue below.


http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@40
PS14, Line 40: differentiate be
> Nit: differentiate.
Done


http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@58
PS14, Line 58: }
> Is this a column name? An index? An ID?
Done


http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@59
PS14, Line 59:
> Is this value encoded? Decoded?
Done


http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@68
PS14, Line 68: int32 seed = 3;
> Column names? Indexes? IDs?
Done


http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/protobuf/backup.proto@73
PS14, Line 73: RangePartitionMetadataPB rangePartitions = 2;
> Same.
Done


http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/10375/14/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66
PS14, Line 66: prevents
> Nit: prevents
Done


http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
File 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala:

http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@41
PS12, Line 41:
> Still to do?
Yeah, because I am not sure what our final limitations are going to be and 
right now that isn't a public "concept" I will put a todo comment.


http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@38
PS12, Line 38: sc
> What I meant was, the KuduContext argument to this function is named 'kuduC
Oh, that's actually a pain, because it collides with a variable in the RDD 
class we are extending.

I will note, ideally we would just use the context included in kuduContext, but 
it has serialization issues, so we can't.


http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@87
PS12, Line 87: 
> Nit: indentation.
Done


http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@87
PS12, Line 87:
> Still to do.
Done


http://gerrit.cloudera.org:8080/#/c/10375/12/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@91
PS12, Line 91:
> Still to do.
Done



[kudu-CR] Kudu Backup/Restore Spark Jobs

2018-06-13 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10375

to look at the new patch set (#16).

Change subject: Kudu Backup/Restore Spark Jobs
..

Kudu Backup/Restore Spark Jobs

Adds a rough base implementation of Kudu backup and restore
Spark jobs. There are many todos indicating gaps and more testing
and details to be be finished.  However, these base jobs work and are
in a functional state that can be committed and iterated on as we
build up and improve our backup functionality.

These jobs, as annotated, should be considered private, unstable,
and experimental.

The backup job can output one to many tables data to any spark
compatible path in any spark compatible format, the defaults being
HDFS and Parquet. Each table’s data is written in a subdirectory of
the provided path. The subdirectory’s name is the url encoded table
name. Additionally in the each tables directory a json metadata file is
output with the metadata needed to recreate the table that was
exported when restoring.

The restore job can read the data and metadata generated and create
“restore” tables with a matching schema and reload the data.

The job arguments are a work in progress and will likely be enhanced
and simplified as we find what is useful and what isn’t through
performance and functional testing. More documentation will be
generated when the jobs are ready for general use.

Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
---
M java/gradle/dependencies.gradle
A java/kudu-backup/build.gradle
A java/kudu-backup/pom.xml
A java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
A 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/resources/log4j.properties
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Type.java
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M java/pom.xml
M java/settings.gradle
18 files changed, 1,568 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/10375/16
--
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 16
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon