Hi Tim,

Thanks for your response.

The benchmark I used just reads data in from HDFS and builds the Linear
Regression model using methods from the MLlib.
Unfortunately, for various reasons, I can't open the source code for the
benchmark at this time.
I will try to replicate the problem using some sample benchmarks provided
by the vanilla Spark distribution.
It is very possible that I have something very screwy in my workload or
setup.

The parameters I used for the Spark on Mesos are the following:
driver memory = 1G
total-executor-cores = 60
spark.executor.memory 6g
spark.storage.memoryFraction 0.9
spark.mesos.coarse = true

The rest are default values, so spark.locality.wait should just be 3000ms.

I launched the Spark job on a separate node from the 10-node cluster using
spark-submit.

With regards to Mesos in fine-grained mode, do you have a feel for the
overhead of
launching executors for every task? Of course, any perceived slow down will
probably be very dependent
on the workload. I just want to have a feel of the possible overhead (e.g.,
factor of 2 or 3 slowdown?).
If not a data locality issue, perhaps this overhead can be a factor in the
slowdown I observed, at least in the fine-grained case.

BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0

Thanks,
Mike




From:   Tim Chen <t...@mesosphere.io>
To:     Michael V Le/Watson/IBM@IBMUS
Cc:     user <user@spark.apache.org>
Date:   01/08/2015 03:04 PM
Subject:        Re: Data locality running Spark on Mesos



How did you run this benchmark, and is there a open version I can try it
with?

And what is your configurations, like spark.locality.wait, etc?

Tim

On Thu, Jan 8, 2015 at 11:44 AM, mvle <m...@us.ibm.com> wrote:
  Hi,

  I've noticed running Spark apps on Mesos is significantly slower compared
  to
  stand-alone or Spark on YARN.
  I don't think it should be the case, so I am posting the problem here in
  case someone has some explanation
  or can point me to some configuration options i've missed.

  I'm running the LinearRegression benchmark with a dataset of 48.8GB.
  On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM),
  I can finish the workload in about 5min (I don't remember exactly).
  The data is loaded into HDFS spanning the same 10-node cluster.
  There are 6 worker instances per node.

  However, when running the same workload on the same cluster but now with
  Spark on Mesos (course-grained mode), the execution time is somewhere
  around
  15min. Actually, I tried with find-grained mode and giving each Mesos
  node 6
  VCPUs (to hopefully get 6 executors like the stand-alone test), I still
  get
  roughly 15min.

  I've noticed that when Spark is running on Mesos, almost all tasks
  execute
  with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On
  stand-alone, the locality is mostly PROCESS_LOCAL.

  I think this locality issue might be the reason for the slow down but I
  can't figure out why, especially for coarse-grained mode as the executors
  supposedly do not go away until job completion.

  Any ideas?

  Thanks,
  Mike



  --
  View this message in context:
  
http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html

  Sent from the Apache Spark User List mailing list archive at Nabble.com.

  ---------------------------------------------------------------------
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to