I’m going to be running some benchmarks against it on Monday on an AWS cluster. I’ll let you know what I come up with.
Dean On 16 May 2015, at 00:30, moon soo Lee <[email protected]<mailto:[email protected]>> wrote: Hi, Zeppelin internally uses sqlContext.sql to execute queries. And uses take() to get results. There might be overhead of transfer result to web gui and rendering it. But i guess rest of the process are the same. I also curious any other people experiences the similar problem. Best, moon On 2015년 5월 15일 (금) at 오후 10:49 Tobias Bockrath <[email protected]<mailto:[email protected]>> wrote: Hello, we deployed Apache Spark 1.3.0 and Apache Zeppelin build with Spark 1.30 in a Hadoop Cluster with one Namenode and two Datanodes. Both are running in yarn-client Mode. So the setup and the preconditions are equal. We executed several SQL queries via the Zeppelin frontend and via the SparkSQL shell. For example we tried queries with 5 join conditions. Also we tried queries on a pre joined dataset with more than 1.000.000 records. We figured out that the execution time of the SparkSQL Shell is much faster than Zeppelins. In fact the execution of SparkSQL queries was 4x - 40x faster than equal queries executed with Zeppelin. Does anyone has similiar experiences? Why has Zeppelin such an overhead although theres the same engine "under the hood"? How does Zeppelin handle queries? Are they passed to Spark directly or are there any optimizations? Kind regards Tobias
