Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11788 )
Change subject: [examples] Add basic Spark example (scala) ...................................................................... Patch Set 10: (15 comments) I tested this locally with a multimaster cluster and with a YARN cluster running Spark and a single-master Kudu cluster. It worked as expected in both cases. However, I think we should remove the ability to run the jar standalone and require spark-submit to be used. This is, AFAICT, typical for Spark applications, and it simplifies the configuration of the job. http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc File examples/scala/spark-example/README.adoc: http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@32 PS10, Line 32: To build and run, ensure maven is installed and from the spark-example directory : run Can you specify what you need for this to work? Do you need Spark running somewhere? A Kudu cluster somewhere? http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@38 PS10, Line 38: $ java -jar target/kudu-spark-example-1.0-SNAPSHOT.jar After poking around the internet for a bit, I think we shouldn't try to make this runnable standalone. It should require spark-submit. If we do that, we can remove the very awkward double specifying of the Spark master. http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@43 PS10, Line 43: Host Remove. http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/README.adoc@70 PS10, Line 70: \ You can't break the line in the middle of a single-quotes literal, since every character inside them is treated literally by the shell. http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala File examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala: http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@17 PS10, Line 17: org.apache.kudu.examples I think this should be org.apache.kudu.spark.examples. http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@32 PS10, Line 32: val kuduMasters: String = System.getProperty("kuduMasters","localhost:7051") // Kudu master address list. Spaces after commas here and elsewhere. http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@34 PS10, Line 34: val sparkMaster: String = System.getProperty("sparkMaster","local") If we only support running this example with spark-submit, which seems to be the only thing done in practice, we don't need this parameter, and invoking the example will be less awkward because the Spark master will only be specified once, and in the normal Spark way. http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@40 PS10, Line 40: Defining Define http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@44 PS10, Line 44: case class User(name:String,id:Int) nit: space here http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@47 PS10, Line 47: nit: remove extra line http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@51 PS10, Line 51: SparkExample KuduSparkExample http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@54 PS10, Line 54: Importing Import http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@67 PS10, Line 67: 1 replica per tablet Let's do the default number of replicas (i.e. remove setting the number of replicas explicitly). That lowers the chance for funny business if somebody copies this and builds it into a production job. http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@102 PS10, Line 102: // Clean up . http://gerrit.cloudera.org:8080/#/c/11788/10/examples/scala/spark-example/src/main/scala/org/apache/kudu/examples/SparkExample.scala@103 PS10, Line 103: $tableName nit: Here and elsewhere, surround tableName in single quotes. -- To view, visit http://gerrit.cloudera.org:8080/11788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9ba09f0118c054a07b951e241c31d66245c57d3f Gerrit-Change-Number: 11788 Gerrit-PatchSet: 10 Gerrit-Owner: Mitch Barnett <mbarn...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Greg Solovyev <gsolov...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mitch Barnett <mbarn...@cloudera.com> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Thu, 01 Nov 2018 18:00:38 +0000 Gerrit-HasComments: Yes