Arun,

On Thu, Sep 18, 2014 at 9:52 AM, Arun Luthra <arun.lut...@gmail.com> wrote:

> I'm doing a spark SQL benchmark similar to the code in
> https://spark.apache.org/docs/latest/sql-programming-guide.html
> (section: Inferring the Schema Using Reflection**). What's the simplest way
> to time the SQL statement itself, so that I'm not timing
> the .map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt) part of the
> RDD creation? I'm using a few calls to System.nanoTime() for timing.
>

I think to isolate that part, you should call persist() after your
split(...) etc. operation and then access the data with an output operation
(such as count()) so that the computation will be executed. After that, if
you use the same RDD in your SQL statement, it should use the persisted
data and you should be more or less able to just measure the time needed
for the SQL processing.
OK?

Tobias

Reply via email to