I’m going to be running some benchmarks against it on Monday on an AWS cluster. 
I’ll let you know what I come up with.

Dean

On 16 May 2015, at 00:30, moon soo Lee 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

Zeppelin internally uses sqlContext.sql to execute queries. And uses take() to 
get results.

There might be overhead of transfer result to web gui and rendering it. But i 
guess rest of the process are the same.

I also curious any other people experiences the similar problem.

Best,
moon

On 2015년 5월 15일 (금) at 오후 10:49 Tobias Bockrath 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

we deployed Apache Spark 1.3.0 and Apache Zeppelin build with Spark 1.30 in a 
Hadoop Cluster with one Namenode and two Datanodes. Both are running in 
yarn-client Mode. So the setup and the preconditions are equal.

We executed several SQL queries via the Zeppelin frontend and via the SparkSQL 
shell. For example we tried queries with 5 join conditions. Also we tried 
queries on a pre joined dataset with more than 1.000.000 records.

We figured out that the execution time of the SparkSQL Shell is much faster 
than Zeppelins. In fact the execution of SparkSQL queries was 4x - 40x faster 
than equal queries executed with Zeppelin.

Does anyone has similiar experiences? Why has Zeppelin such an overhead 
although theres the same engine "under the hood"?  How does Zeppelin handle 
queries? Are they passed to Spark directly or are there any optimizations?

Kind regards
Tobias


Reply via email to