Re: Data locality running Spark on Mesos

Michael V Le Sun, 11 Jan 2015 22:09:07 -0800

I tried two Spark stand-alone configurations:
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_INSTANCES=6
spark.driver.memory 1g
spark.executor.memory 1g
spark.storage.memoryFraction 0.9
--total-executor-cores 60


In the second configuration (same as first, but):
SPARK_WORKER_CORES=6
SPARK_WORKER_MEMORY=6g
SPARK_WORKER_INSTANCES=1
spark.executor.memory 6g

Runs using the first configuration have faster execution times compared
with Spark runs on my configuration of Mesos (both coarse-grained and
fine-grained),
Runs using second configuration had about the same execution time as with
Mesos.
Looking at the logs again, it looks like the locality info between the
stand-alone and Mesos coarse-grained mode are very similar.
I must have been hallucinating earlier thinking somehow the data locality
information was different.

So this whole thing might just simply be due to the fact that it is not
possible in Mesos right now to set the number of executors.
Even in fine-grained mode, there seems to be just one executor per node (I
had thought differently in my previous message).
The workloads I've tried apparently performs better with many executors per
node than a single powerful executor per node.

Would be really useful once this feature you've mentioned:
https://issues.apache.org/jira/browse/SPARK-5095
is implemented.

Spark on Mesos fine-grained configuration:
driver memory = 1G
spark.executor.memory 6g  (tried also with 1g, still one executor per node
and execution time roughly the same)
spark.storage.memoryFraction 0.9


Mike




From:   Timothy Chen <t...@mesosphere.io>
To:     Michael V Le/Watson/IBM@IBMUS
Cc:     user <user@spark.apache.org>
Date:   01/10/2015 04:31 AM
Subject:        Re: Data locality running Spark on Mesos



Hi Michael,

I see you capped the cores to 60.

I wonder what's the settings you used for standalone mode that you compared
with?

I can try to run a MLib workload on both to compare.

Tim

On Jan 9, 2015, at 6:42 AM, Michael V Le <m...@us.ibm.com> wrote:



      Hi Tim,

      Thanks for your response.

      The benchmark I used just reads data in from HDFS and builds the
      Linear Regression model using methods from the MLlib.
      Unfortunately, for various reasons, I can't open the source code for
      the benchmark at this time.
      I will try to replicate the problem using some sample benchmarks
      provided by the vanilla Spark distribution.
      It is very possible that I have something very screwy in my workload
      or setup.

      The parameters I used for the Spark on Mesos are the following:
      driver memory = 1G
      total-executor-cores = 60
      spark.executor.memory 6g
      spark.storage.memoryFraction 0.9
      spark.mesos.coarse = true

      The rest are default values, so spark.locality.wait should just be
      3000ms.

      I launched the Spark job on a separate node from the 10-node cluster
      using spark-submit.

      With regards to Mesos in fine-grained mode, do you have a feel for
      the overhead of
      launching executors for every task? Of course, any perceived slow
      down will probably be very dependent
      on the workload. I just want to have a feel of the possible overhead
      (e.g., factor of 2 or 3 slowdown?).
      If not a data locality issue, perhaps this overhead can be a factor
      in the slowdown I observed, at least in the fine-grained case.

      BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0

      Thanks,
      Mike


      <graycol.gif>Tim Chen ---01/08/2015 03:04:51 PM---How did you run
      this benchmark, and is there a open version I can try it with?

      From: Tim Chen <t...@mesosphere.io>
      To: Michael V Le/Watson/IBM@IBMUS
      Cc: user <user@spark.apache.org>
      Date: 01/08/2015 03:04 PM
      Subject: Re: Data locality running Spark on Mesos





      How did you run this benchmark, and is there a open version I can try
      it with?

      And what is your configurations, like spark.locality.wait, etc?

      Tim

      On Thu, Jan 8, 2015 at 11:44 AM, mvle <m...@us.ibm.com> wrote:
            Hi,

            I've noticed running Spark apps on Mesos is significantly
            slower compared to
            stand-alone or Spark on YARN.
            I don't think it should be the case, so I am posting the
            problem here in
            case someone has some explanation
            or can point me to some configuration options i've missed.

            I'm running the LinearRegression benchmark with a dataset of
            48.8GB.
            On a 10-node stand-alone Spark cluster (each node 4-core, 8GB
            of RAM),
            I can finish the workload in about 5min (I don't remember
            exactly).
            The data is loaded into HDFS spanning the same 10-node cluster.
            There are 6 worker instances per node.

            However, when running the same workload on the same cluster but
            now with
            Spark on Mesos (course-grained mode), the execution time is
            somewhere around
            15min. Actually, I tried with find-grained mode and giving each
            Mesos node 6
            VCPUs (to hopefully get 6 executors like the stand-alone test),
            I still get
            roughly 15min.

            I've noticed that when Spark is running on Mesos, almost all
            tasks execute
            with locality NODE_LOCAL (even in Mesos in coarse-grained
            mode). On
            stand-alone, the locality is mostly PROCESS_LOCAL.

            I think this locality issue might be the reason for the slow
            down but I
            can't figure out why, especially for coarse-grained mode as the
            executors
            supposedly do not go away until job completion.

            Any ideas?

            Thanks,
            Mike



            --
            View this message in context:
            
http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html

            Sent from the Apache Spark User List mailing list archive at
            Nabble.com.

            
---------------------------------------------------------------------

            To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
            For additional commands, e-mail: user-h...@spark.apache.org

Re: Data locality running Spark on Mesos

Reply via email to