I'm doing a spark SQL benchmark similar to the code in
https://spark.apache.org/docs/latest/sql-programming-guide.html
(section: Inferring the Schema Using Reflection**). What's the simplest way
to time the SQL statement itself, so that I'm not timing
the .map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt) part of the
RDD creation? I'm using a few calls to System.nanoTime() for timing.

Arun

**
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD

case class Person(name: String, age: Int)
val people =
sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p
=> Person(p(0), p(1).trim.toInt))
people.registerTempTable("people")

val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND
age <= 19")

teenagers.map(t => "Name: " + t(0)).collect().foreach(println)

Reply via email to