[kudu-CR] [examples] Add basic Spark example written in Scala

Mitch Barnett (Code Review) Thu, 25 Oct 2018 16:11:34 -0700

Mitch Barnett has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11788 )


Change subject: [examples] Add basic Spark example written in Scala
......................................................................


Patch Set 4: Code-Review+1

(11 comments)

Updated per Grant's feedback.

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@18
PS1, Line 18: = Kudu-Spark example README
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@40
PS1, Line 40: - KuduMasters: A String value consisting of a comma-separated 
list of Kudu Master Host addresses. This will need to be pointed to the Kudu 
cluster you wish to target.
> Is this needed? Given the code is creating the table, we have control over
Good call - probably better to give less control as to not cause additional 
failures.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/README.adoc@43
PS1, Line 43: To specify a value at execution time, you'll specify the 
parameter name in the '-D<name>' format. For example, to set a different set of 
masters for the Kudu cluster from the command line and use a custom table name, 
set the property `KuduMasters` to a CSV of the master addresses in the form 
`host:port` and add a table name value, as shown:
> nit: Move up near KUDU_MASTERS
I didn't see any default listed for this value - it's inclusion is only due to 
it being a requirement of the SparkSession create statement.

I suppose you could infer that 'local' is the default value, since any other 
value would require a spark master to be explicitly defined.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File 
examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@14
PS1, Line 14: object SparkExample {
> nit: val KuduMasters: String = ...
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@19
PS1, Line 19:   val NameCol = "name"
> Should this be configurable like KUDU_MASTERS?
I went ahead and made it configurable, as to allow those who want to the option 
of changing where it runs.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS1, Line 32:
> Nit: I think the below sytax is easier
Went ahead and made that syntax change - bit easier to read.

Also went ahead and explicitly defined 'false' for the structFields instead of 
the NULLABLE constant.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@39
PS1, Line 39:
> nit: Replication factor of 1 might be easier for examples.
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@43
PS1, Line 43:                   )
> Can this be moved up by the other imports.
From what I've read in the Spark documentation and what I've seen in the code 
base, it can't be moved.

It's being called directly from the SparkSession we instantiated above.
I could move it higher in main(), but I can't take it up to the other imports 
unfortunately.


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS1, Line 47:       if (!kc.tableExists(TableName)) {
> Nit: Use slf4j logging.
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@77
PS1, Line 77:       //create a DF and map the values of the KuduMaster and 
TableName values
> No need to catch if there is no handling. Just let the exception bubble up.
Done


http://gerrit.cloudera.org:8080/#/c/11788/1/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@87
PS1, Line 87:       kc.deleteTable(TableName)
> No need to catch here either.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 4
Gerrit-Owner: Mitch Barnett <mbarn...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <mbarn...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 23:11:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example written in Scala

Reply via email to