Hello,

we deployed Apache Spark 1.3.0 and Apache Zeppelin build with Spark 1.30 in
a Hadoop Cluster with one Namenode and two Datanodes. Both are running in
yarn-client Mode. So the setup and the preconditions are equal.

We executed several SQL queries via the Zeppelin frontend and via the
SparkSQL shell. For example we tried queries with 5 join conditions. Also
we tried queries on a pre joined dataset with more than 1.000.000 records.

We figured out that the execution time of the SparkSQL Shell is much faster
than Zeppelins. In fact the execution of SparkSQL queries was 4x - 40x
faster than equal queries executed with Zeppelin.

Does anyone has similiar experiences? Why has Zeppelin such an overhead
although theres the same engine "under the hood"?  How does Zeppelin handle
queries? Are they passed to Spark directly or are there any optimizations?

Kind regards
Tobias

Reply via email to