Hi Tim, Thanks for your response.
The benchmark I used just reads data in from HDFS and builds the Linear Regression model using methods from the MLlib. Unfortunately, for various reasons, I can't open the source code for the benchmark at this time. I will try to replicate the problem using some sample benchmarks provided by the vanilla Spark distribution. It is very possible that I have something very screwy in my workload or setup. The parameters I used for the Spark on Mesos are the following: driver memory = 1G total-executor-cores = 60 spark.executor.memory 6g spark.storage.memoryFraction 0.9 spark.mesos.coarse = true The rest are default values, so spark.locality.wait should just be 3000ms. I launched the Spark job on a separate node from the 10-node cluster using spark-submit. With regards to Mesos in fine-grained mode, do you have a feel for the overhead of launching executors for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least in the fine-grained case. BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0 Thanks, Mike From: Tim Chen <t...@mesosphere.io> To: Michael V Le/Watson/IBM@IBMUS Cc: user <user@spark.apache.org> Date: 01/08/2015 03:04 PM Subject: Re: Data locality running Spark on Mesos How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle <m...@us.ibm.com> wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org