If you want a performance boost, you need to load the full table in memory using caching and them execute your query directly on cached dataframe. Otherwise you use spark only as a bridge and you don't leverage the distributed in memory engine of spark.
Paolo Inviata dal mio Windows Phone ________________________________ Da: Louis Hust<mailto:[email protected]> Inviato: 26/07/2015 10:28 A: Shixiong Zhu<mailto:[email protected]> Cc: Jerrick Hoang<mailto:[email protected]>; [email protected]<mailto:[email protected]> Oggetto: Re: Spark is much slower than direct access MySQL Thanks for your explain 2015-07-26 16:22 GMT+08:00 Shixiong Zhu <[email protected]<mailto:[email protected]>>: Oh, I see. That's the total time of executing a query in Spark. Then the difference is reasonable, considering Spark has much more work to do, e.g., launching tasks in executors. Best Regards, Shixiong Zhu 2015-07-26 16:16 GMT+08:00 Louis Hust <[email protected]<mailto:[email protected]>>: Look at the given url: Code can be found at: https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java 2015-07-26 16:14 GMT+08:00 Shixiong Zhu <[email protected]<mailto:[email protected]>>: Could you clarify how you measure the Spark time cost? Is it the total time of running the query? If so, it's possible because the overhead of Spark dominates for small queries. Best Regards, Shixiong Zhu 2015-07-26 15:56 GMT+08:00 Jerrick Hoang <[email protected]<mailto:[email protected]>>: how big is the dataset? how complicated is the query? On Sun, Jul 26, 2015 at 12:47 AM Louis Hust <[email protected]<mailto:[email protected]>> wrote: Hi, all, I am using spark DataFrame to fetch small table from MySQL, and i found it cost so much than directly access MySQL Using JDBC. Time cost for Spark is about 2033ms, and direct access at about 16ms. Code can be found at: https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java So If my configuration for spark is wrong? How to optimise Spark to achieve the similar performance like direct access? Any idea will be appreciated!
