On 04/11/2015 02:50 PM, Timothy Chen wrote:
Hi James,

You are right multiple frameworks becomes a different discussion how to adjust 
and allow more
dynamic  resource negotiation to happen, also factor in fairness and others.

Agreed, there will be many more parameters to define need so that schedulers and other resource managers can do their job. Maybe a list where those parameters are categorized, in a central location is a good start at characterizing that part of the problem.... Different frameworks may need to pass different parameters around and to the resource manager(s).


There are more work that is happening in mesos to try to address multiple 
framework like
optimistic offer and inverse offers, but I think in terms of dynamic memory 
needs for a
framework its still largely based on the scheduler to specify and scale 
accordingly when
resources are needed or not needed anymore.

The distributed applications (frameworks) will have to provided frequently updated info to the resource managers. Perhaps some typical examples of such details would be good info to start collecting, analyzing and organizing? Also, the memory/core ratio of a given cluster
will differ from one to the next cluster.


One way that is being addressed in spark is integrating dynamic allocation into
resource scheduler such as mesos and yarn, but there are still more work needed 
as
dynamic allocation only looks at certain metrics that might not address all 
kinds of needs.

Yes both the frameworks and the resource manager will have to pass information back and forth, in a low latency, dynamic fashion, along with other communications needs. I'm sure the number of metrics will grow, as folks dig deeper into these issues.


If you have any specific use case or examples that you think existing work 
doesn't fit and
like to be addressed that will be a good way to start the conversation.

I doubt seriously there is a single "silver bullet" here, or even a few silver bullets that can get the job done. My suspicion is that as we run an expanding array of distributed applications on mesos, we'll eventually gather up enough "profile" information on these distributed applications on top of mesos, to be able to sort them into quasi categories. Profiling of running codes has always been an edgy endeaver, but I see no other reliable way to actually figure out anything close to optimization of resources, for a cluster. We can then figure out what mix of categorized distributed applications best fits the dynamic resource model Then tune the RM to that organization's priority semantics for resource allocation. Once new frameworks are profiled, the Resource Manager can then make better decisions.

I think the real trick is going to be coordination with those low level kernel resources that we can profile, with such tools as kernelshark, with system level resource monitoring as to reinforce the cluster resource management decisions. I.E. we're going to have to run a wide variety of these frameworks, in isolation, and use both kernel and system level resource monitoring tools to actually accurately characterize the resource loading curves. Then we can propose dynamic methods to manage the resource demands of a given category of frameworks.

Eventually, we could then go down the path of profiling the cluster, as multiple simultaneous resource loads are mix, due to a variety of frameworks and profiled, yet again. Knowlege on smaller scale clusters might not be linearly applicable to larger clusters, but it is a start, imho.

I think it's going to be years of collaboration, with codes, patches and profiles being shared, to "tame the beast". However, that said, I would certainly be happy if I'm wrong and look forward to those ideas to simplify this problem's solution.



James



Tim

On Apr 11, 2015, at 1:05 PM, CCAAT <cc...@tampabay.rr.com> wrote:

Hello Tim,

Your approach seems most reasonable, particularly from an over arching 
viewpoint. However, it occurs to me the that as folks have several to many 
different frameworks (distributed applications)  running on a given mesos 
cluster, that the optimization of resource allocation (utilization) may 
ultimately need to be under some sort of tunable, dynamic scheme. Most 
distributed application, say it runs for a few hours, will usually not have a 
constant resource demand on memory  so how can any static configuration suffice 
for a dynamic mix of frequently changing distributed application work well with 
static configurations. This is particularly amplified as a problem, where
Apache-spark is an "in-memory" resource demand, that is very different
than other frameworks that may be active on the same cluster.

I really think we are just experiencing the tip of the iceberg here
as these mesos clusters grow, expand and take on a variety of problems,
or did I miss some already existing robustness in the codes?


James



On 04/11/2015 12:29 PM, Tim Chen wrote:
(Adding spark user list)

Hi Tom,

If I understand correctly you're saying that you're running into memory
problems because the scheduler is allocating too much CPUs and not
enough memory to acoomodate them right?

In the case of fine grain mode I don't think that's a problem since we
have a fixed amount of CPU and memory per task.
However, in coarse grain you can run into that problem if you're with in
the spark.cores.max limit, and memory is a fixed number.

I have a patch out to configure how much max cpus should coarse grain
executor use, and it also allows multiple executors in coarse grain
mode. So you could say try to launch multiples of max 4 cores with
spark.executor.memory (+ overhead and etc) in a slave.
(https://github.com/apache/spark/pull/4027)

It also might be interesting to include a cores to memory multiplier so
that with a larger amount of cores we try to scale the memory with some
factor, but I'm not entirely sure that's intuitive to use and what
people know what to set it to, as that can likely change with different
workload.

Tim







On Sat, Apr 11, 2015 at 9:51 AM, Tom Arnfeld <t...@duedil.com
<mailto:t...@duedil.com>> wrote:

    We're running Spark 1.3.0 (with a couple of patches over the top for
    docker related bits).

    I don't think SPARK-4158 is related to what we're seeing, things do
    run fine on the cluster, given a ridiculously large executor memory
    configuration. As for SPARK-3535 although that looks useful I think
    we'e seeing something else.

    Put a different way, the amount of memory required at any given time
    by the spark JVM process is directly proportional to the amount of
    CPU it has, because more CPU means more tasks and more tasks means
    more memory. Even if we're using coarse mode, the amount of executor
    memory should be proportionate to the amount of CPUs in the offer.

    On 11 April 2015 at 17:39, Brenden Matthews <bren...@diddyinc.com
    <mailto:bren...@diddyinc.com>> wrote:

        I ran into some issues with it a while ago, and submitted a
        couple PRs to fix it:

        https://github.com/apache/spark/pull/2401
        https://github.com/apache/spark/pull/3024

        Do these look relevant? What version of Spark are you running?

        On Sat, Apr 11, 2015 at 9:33 AM, Tom Arnfeld <t...@duedil.com
        <mailto:t...@duedil.com>> wrote:

            Hey,

            Not sure whether it's best to ask this on the spark mailing
            list or the mesos one, so I'll try here first :-)

            I'm having a bit of trouble with out of memory errors in my
            spark jobs... it seems fairly odd to me that memory
            resources can only be set at the executor level, and not
            also at the task level. For example, as far as I can tell
            there's only a *spark.executor.memory* config option.

            Surely the memory requirements of a single executor are
            quite dramatically influenced by the number of concurrent
            tasks running? Given a shared cluster, I have no idea what %
            of an individual slave my executor is going to get, so I
            basically have to set the executor memory to a value that's
            correct when the whole machine is in use...

            Has anyone else running Spark on Mesos come across this, or
            maybe someone could correct my understanding of the config
            options?

            Thanks!

            Tom.



Reply via email to