Hi, Zeppelin internally uses sqlContext.sql to execute queries. And uses take() to get results.
There might be overhead of transfer result to web gui and rendering it. But i guess rest of the process are the same. I also curious any other people experiences the similar problem. Best, moon On 2015년 5월 15일 (금) at 오후 10:49 Tobias Bockrath <[email protected]> wrote: > Hello, > > we deployed Apache Spark 1.3.0 and Apache Zeppelin build with Spark 1.30 > in a Hadoop Cluster with one Namenode and two Datanodes. Both are running > in yarn-client Mode. So the setup and the preconditions are equal. > > We executed several SQL queries via the Zeppelin frontend and via the > SparkSQL shell. For example we tried queries with 5 join conditions. Also > we tried queries on a pre joined dataset with more than 1.000.000 records. > > We figured out that the execution time of the SparkSQL Shell is much > faster than Zeppelins. In fact the execution of SparkSQL queries was 4x - > 40x faster than equal queries executed with Zeppelin. > > Does anyone has similiar experiences? Why has Zeppelin such an overhead > although theres the same engine "under the hood"? How does Zeppelin handle > queries? Are they passed to Spark directly or are there any optimizations? > > Kind regards > Tobias > >
