[kudu-CR] [examples] Add basic Spark example (scala)

Will Berkeley (Code Review) Thu, 01 Nov 2018 11:01:05 -0700

Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11788 )


Change subject: [examples] Add basic Spark example (scala)
......................................................................


Patch Set 10:

(15 comments)

I tested this locally with a multimaster cluster and with a YARN cluster 
running Spark and a single-master Kudu cluster. It worked as expected in both 
cases.

However, I think we should remove the ability to run the jar standalone and 
require spark-submit to be used. This is, AFAICT, typical for Spark 
applications, and it simplifies the configuration of the job.

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc
File examples/scala/spark-example/README.adoc:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@32
PS10, Line 32: To build and run, ensure maven is installed and from the 
spark-example directory
             : run
Can you specify what you need for this to work? Do you need Spark running 
somewhere? A Kudu cluster somewhere?


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@38
PS10, Line 38: $ java -jar target/kudu-spark-example-1.0-SNAPSHOT.jar
After poking around the internet for a bit, I think we shouldn't try to make 
this runnable standalone. It should require spark-submit. If we do that, we can 
remove the very awkward double specifying of the Spark master.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@43
PS10, Line 43: Host
Remove.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@70
PS10, Line 70: \
You can't break the line in the middle of a single-quotes literal, since every 
character inside them is treated literally by the shell.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala
File 
examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala:

http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@17
PS10, Line 17: org.apache.kudu.examples
I think this should be org.apache.kudu.spark.examples.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32
PS10, Line 32:   val kuduMasters: String = 
System.getProperty("kuduMasters","localhost:7051")   // Kudu master address 
list.
Spaces after commas here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@34
PS10, Line 34: val sparkMaster: String = 
System.getProperty("sparkMaster","local")
If we only support running this example with spark-submit, which seems to be 
the only thing done in practice, we don't need this parameter, and invoking the 
example will be less awkward because the Spark master will only be specified 
once, and in the normal Spark way.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40
PS10, Line 40: Defining
Define


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@44
PS10, Line 44:   case class User(name:String,id:Int)
nit: space here


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47
PS10, Line 47:
nit: remove extra line


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@51
PS10, Line 51: SparkExample
KuduSparkExample


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@54
PS10, Line 54: Importing
Import


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@67
PS10, Line 67: 1 replica per tablet
Let's do the default number of replicas (i.e. remove setting the number of 
replicas explicitly). That lowers the chance for funny business if somebody 
copies this and builds it into a production job.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@102
PS10, Line 102:       // Clean up
.


http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@103
PS10, Line 103: $tableName
nit: Here and elsewhere, surround tableName in single quotes.



--
To view, visit http://gerrit.cloudera.org:8080/11788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f
Gerrit-Change-Number: 11788
Gerrit-PatchSet: 10
Gerrit-Owner: Mitch Barnett <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Greg Solovyev <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Thu, 01 Nov 2018 18:00:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] [examples] Add basic Spark example (scala)

Reply via email to