Hi,

Zeppelin internally uses sqlContext.sql to execute queries. And uses take()
to get results.

There might be overhead of transfer result to web gui and rendering it. But
i guess rest of the process are the same.

I also curious any other people experiences the similar problem.

Best,
moon

On 2015년 5월 15일 (금) at 오후 10:49 Tobias Bockrath <[email protected]> wrote:

> Hello,
>
> we deployed Apache Spark 1.3.0 and Apache Zeppelin build with Spark 1.30
> in a Hadoop Cluster with one Namenode and two Datanodes. Both are running
> in yarn-client Mode. So the setup and the preconditions are equal.
>
> We executed several SQL queries via the Zeppelin frontend and via the
> SparkSQL shell. For example we tried queries with 5 join conditions. Also
> we tried queries on a pre joined dataset with more than 1.000.000 records.
>
> We figured out that the execution time of the SparkSQL Shell is much
> faster than Zeppelins. In fact the execution of SparkSQL queries was 4x -
> 40x faster than equal queries executed with Zeppelin.
>
> Does anyone has similiar experiences? Why has Zeppelin such an overhead
> although theres the same engine "under the hood"?  How does Zeppelin handle
> queries? Are they passed to Spark directly or are there any optimizations?
>
> Kind regards
> Tobias
>
>

Reply via email to