Re: Getting the execution times of spark job

2014-09-02 Thread Zongheng Yang
For your second question: hql() (as well as sql()) does not launch a
Spark job immediately; instead, it fires off the Spark SQL
parser/optimizer/planner pipeline first, and a Spark job will be
started after the a physical execution plan is selected. Therefore,
your hand-rolled end-to-end measurement includes the time to go
through the Spark SQL code path, and the times reported inside the UI
are the execution times of the Spark job(s) only.

On Mon, Sep 1, 2014 at 11:45 PM, Niranda Perera  wrote:
> Hi,
>
> I have been playing around with spark for a couple of days. I am
> using spark-1.0.1-bin-hadoop1 and the Java API. The main idea of the
> implementation is to run Hive queries on Spark. I used JavaHiveContext to
> achieve this (As per the examples).
>
> I have 2 questions.
> 1. I am wondering how I could get the execution times of a spark job? Does
> Spark provide monitoring facilities in the form of an API?
>
> 2. I used a laymen way to get the execution times by enclosing a
> JavaHiveContext.hql method with System.nanoTime() as follows
>
> long start, end;
> JavaHiveContext hiveCtx;
> JavaSchemaRDD hiveResult;
>
> start = System.nanoTime();
> hiveResult = hiveCtx.hql(query);
> end = System.nanoTime();
> System.out.println(start-end);
>
> But the result I got is drastically different from the execution times
> recorded in SparkUI. Can you please explain this disparity?
>
> Look forward to hearing from you.
>
> rgds
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Getting the execution times of spark job

2014-09-02 Thread Niranda Perera
Hi,

I have been playing around with spark for a couple of days. I am
using spark-1.0.1-bin-hadoop1 and the Java API. The main idea of the
implementation is to run Hive queries on Spark. I used JavaHiveContext to
achieve this (As per the examples).

I have 2 questions.
1. I am wondering how I could get the execution times of a spark job? Does
Spark provide monitoring facilities in the form of an API?

2. I used a laymen way to get the execution times by enclosing a
JavaHiveContext.hql method with System.nanoTime() as follows

long start, end;
JavaHiveContext hiveCtx;
JavaSchemaRDD hiveResult;

start = System.nanoTime();
hiveResult = hiveCtx.hql(query);
end = System.nanoTime();
System.out.println(start-end);

But the result I got is drastically different from the execution times
recorded in SparkUI. Can you please explain this disparity?

Look forward to hearing from you.

rgds

-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44