Hello, we deployed Apache Spark 1.3.0 and Apache Zeppelin build with Spark 1.30 in a Hadoop Cluster with one Namenode and two Datanodes. Both are running in yarn-client Mode. So the setup and the preconditions are equal.
We executed several SQL queries via the Zeppelin frontend and via the SparkSQL shell. For example we tried queries with 5 join conditions. Also we tried queries on a pre joined dataset with more than 1.000.000 records. We figured out that the execution time of the SparkSQL Shell is much faster than Zeppelins. In fact the execution of SparkSQL queries was 4x - 40x faster than equal queries executed with Zeppelin. Does anyone has similiar experiences? Why has Zeppelin such an overhead although theres the same engine "under the hood"? How does Zeppelin handle queries? Are they passed to Spark directly or are there any optimizations? Kind regards Tobias
