Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-15 Thread David Greenberg
Cook launches shells with around one minute of latency. I believe there is
a project to reduce that to seconds.
On Thu, Jul 14, 2016 at 10:50 PM Rahul Palamuttam 
wrote:

> Hallelujah!
>
> We'll definitely take a look at cook.
> Right now we're observing in both fine grained and coarse grained jobs
> take quite a bit of time to even be staged by mesos.
>
> We're sitting there waiting on the interpreter/shell for quite a few
> minutes.
>
> On Jul 14, 2016, at 7:49 PM, David Greenberg 
> wrote:
>
> By true multitenancy, I mean preemption, so that if a new user connects to
> the cluster, their capacity is actually reclaimed and reallocated in
> minutes or seconds instead of hours.
> On Wed, Jul 13, 2016 at 7:11 PM Rahul Palamuttam 
> wrote:
>
>> Thanks David.
>> We will definitely take a look at Cook.
>>
>> I am curious by what you mean by true multi-tenancy.
>>
>> Under coarse-grained mode with dynamic allocation enabled - what I see in
>> the mesos UI is that there are 3 tasks running by default (one on each of
>> the nodes nodes we have).
>> I also see the coarsegrainedexecutors being brought up.
>>
>> *Another point is that I always see a spark-submit command being launched
>> even if I kill that command it comes back up and the exectors get
>> reallocated on the worker nodes.
>> However, I am able to launch multiple spark shells and have jobs run
>> concurrently - which we were very happy with.
>> Unfortunately, I don't understand why mesos only shows 3 tasks running. I
>> even see the spike in thread count when launching my jobs, but the task
>> count remains unchanged.
>> The mesos logs does show jobs coming in.
>> The three tasks just sit there in the webui - running.
>>
>> Is this what is expected?
>> Does running coarsegrained with dynamic allocation make mesos look at
>> each running executor as a different task?
>>
>>
>>
>>
>> On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg 
>> wrote:
>>
>>> You could also check out Cook from twosigma. It's open source on github,
>>> and offers true preemptive multitenancy with spark on Mesos, by
>>> intermediating the spark drivers to optimize the cluster overall.
>>> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam 
>>> wrote:
>>>
 Thank you Joseph.

 We'll try to explore coarse grained mode with dynamic allocation.

 On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu 
 wrote:

> Looks like you're running Spark in "fine-grained" mode (deprecated).
>
> (The Spark website appears to be down right now, so here's the doc on
> Github:)
>
> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>
> Note that while Spark tasks in fine-grained will relinquish cores as
>> they terminate, they will not relinquish memory, as the JVM does not give
>> memory back to the Operating System. Neither will executors terminate 
>> when
>> they're idle.
>
>
> You can follow some of the recommendations Spark has in that document
> for sharing resources, when using Mesos.
>
> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
> rahulpala...@gmail.com> wrote:
>
>> Hi,
>>
>> Our team has been tackling multi-tenancy related issues with Mesos
>> for quite some time.
>>
>> The problem is that tasks aren't being allocated properly when
>> multiple applications are trying to launch a job. If we launch 
>> application
>> A, and soon after application B, application B waits pretty much till the
>> completion of application A for tasks to even be staged in Mesos. Right 
>> now
>> these applications are the spark-shell or the zeppelin interpreter.
>>
>> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
>> different spark-shells results in the issue we're observing. One of the
>> counts waits (in fact we don't even see the tasks being staged in mesos)
>> until the current one finishes. This is the biggest issue we have been
>> experience and any help or advice would be greatly appreciated. We want 
>> to
>> be able to launch multiple jobs concurrently on our cluster and share
>> resources appropriately.
>>
>> Another issue we see is that the java heap-space on the mesos
>> executor backend process is not being cleaned up once a job has finished 
>> in
>> the spark shell.
>> I've attached a png file of the jvisualvm output showing that the
>> heapspace is still allocated on a worker node. If I force the GC from
>> jvisualvm then nearly all of that memory gets cleaned up. This may be
>> because the spark-shell is still active - but if we've waited long enough
>> why doesn't GC just clean up the space? However, even after forcing GC 
>> the
>> mesos UI shows us that these resources are still being used.
>> There should be a way to bring down the memory utilization of the
>> executors once a task is fini

Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-14 Thread Rahul Palamuttam
Hallelujah!

We'll definitely take a look at cook. 
Right now we're observing in both fine grained and coarse grained jobs take 
quite a bit of time to even be staged by mesos.

We're sitting there waiting on the interpreter/shell for quite a few minutes.

> On Jul 14, 2016, at 7:49 PM, David Greenberg  wrote:
> 
> By true multitenancy, I mean preemption, so that if a new user connects to 
> the cluster, their capacity is actually reclaimed and reallocated in minutes 
> or seconds instead of hours. 
>> On Wed, Jul 13, 2016 at 7:11 PM Rahul Palamuttam  
>> wrote:
>> Thanks David.
>> We will definitely take a look at Cook.
>> 
>> I am curious by what you mean by true multi-tenancy.
>> 
>> Under coarse-grained mode with dynamic allocation enabled - what I see in 
>> the mesos UI is that there are 3 tasks running by default (one on each of 
>> the nodes nodes we have).
>> I also see the coarsegrainedexecutors being brought up.
>> 
>> *Another point is that I always see a spark-submit command being launched 
>> even if I kill that command it comes back up and the exectors get 
>> reallocated on the worker nodes.
>> However, I am able to launch multiple spark shells and have jobs run 
>> concurrently - which we were very happy with.
>> Unfortunately, I don't understand why mesos only shows 3 tasks running. I 
>> even see the spike in thread count when launching my jobs, but the task 
>> count remains unchanged.
>> The mesos logs does show jobs coming in.
>> The three tasks just sit there in the webui - running.
>> 
>> Is this what is expected?
>> Does running coarsegrained with dynamic allocation make mesos look at each 
>> running executor as a different task?
>> 
>> 
>> 
>> 
>>> On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg  
>>> wrote:
>>> You could also check out Cook from twosigma. It's open source on github, 
>>> and offers true preemptive multitenancy with spark on Mesos, by 
>>> intermediating the spark drivers to optimize the cluster overall. 
 On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam  
 wrote:
 Thank you Joseph.
 
 We'll try to explore coarse grained mode with dynamic allocation. 
 
> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu  wrote:
> Looks like you're running Spark in "fine-grained" mode (deprecated).
> 
> (The Spark website appears to be down right now, so here's the doc on 
> Github:)
> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
> 
>> Note that while Spark tasks in fine-grained will relinquish cores as 
>> they terminate, they will not relinquish memory, as the JVM does not 
>> give memory back to the Operating System. Neither will executors 
>> terminate when they're idle.
> 
> You can follow some of the recommendations Spark has in that document for 
> sharing resources, when using Mesos. 
> 
>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam 
>>  wrote:
>> Hi,
>> 
>> Our team has been tackling multi-tenancy related issues with Mesos for 
>> quite some time.
>> 
>> The problem is that tasks aren't being allocated properly when multiple 
>> applications are trying to launch a job. If we launch application A, and 
>> soon after application B, application B waits pretty much till the 
>> completion of application A for tasks to even be staged in Mesos. Right 
>> now these applications are the spark-shell or the zeppelin interpreter. 
>> 
>> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two 
>> different spark-shells results in the issue we're observing. One of the 
>> counts waits (in fact we don't even see the tasks being staged in mesos) 
>> until the current one finishes. This is the biggest issue we have been 
>> experience and any help or advice would be greatly appreciated. We want 
>> to be able to launch multiple jobs concurrently on our cluster and share 
>> resources appropriately. 
>> 
>> Another issue we see is that the java heap-space on the mesos executor 
>> backend process is not being cleaned up once a job has finished in the 
>> spark shell. 
>> I've attached a png file of the jvisualvm output showing that the 
>> heapspace is still allocated on a worker node. If I force the GC from 
>> jvisualvm then nearly all of that memory gets cleaned up. This may be 
>> because the spark-shell is still active - but if we've waited long 
>> enough why doesn't GC just clean up the space? However, even after 
>> forcing GC the mesos UI shows us that these resources are still being 
>> used.
>> There should be a way to bring down the memory utilization of the 
>> executors once a task is finished. It shouldn't continue to have that 
>> memory allocated, even if a spark-shell is active on the driver.
>> 
>> We have mesos configured to use fine-grained mode. 
>> The following are parame

Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-14 Thread David Greenberg
By true multitenancy, I mean preemption, so that if a new user connects to
the cluster, their capacity is actually reclaimed and reallocated in
minutes or seconds instead of hours.
On Wed, Jul 13, 2016 at 7:11 PM Rahul Palamuttam 
wrote:

> Thanks David.
> We will definitely take a look at Cook.
>
> I am curious by what you mean by true multi-tenancy.
>
> Under coarse-grained mode with dynamic allocation enabled - what I see in
> the mesos UI is that there are 3 tasks running by default (one on each of
> the nodes nodes we have).
> I also see the coarsegrainedexecutors being brought up.
>
> *Another point is that I always see a spark-submit command being launched
> even if I kill that command it comes back up and the exectors get
> reallocated on the worker nodes.
> However, I am able to launch multiple spark shells and have jobs run
> concurrently - which we were very happy with.
> Unfortunately, I don't understand why mesos only shows 3 tasks running. I
> even see the spike in thread count when launching my jobs, but the task
> count remains unchanged.
> The mesos logs does show jobs coming in.
> The three tasks just sit there in the webui - running.
>
> Is this what is expected?
> Does running coarsegrained with dynamic allocation make mesos look at each
> running executor as a different task?
>
>
>
>
> On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg 
> wrote:
>
>> You could also check out Cook from twosigma. It's open source on github,
>> and offers true preemptive multitenancy with spark on Mesos, by
>> intermediating the spark drivers to optimize the cluster overall.
>> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam 
>> wrote:
>>
>>> Thank you Joseph.
>>>
>>> We'll try to explore coarse grained mode with dynamic allocation.
>>>
>>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu 
>>> wrote:
>>>
 Looks like you're running Spark in "fine-grained" mode (deprecated).

 (The Spark website appears to be down right now, so here's the doc on
 Github:)

 https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated

 Note that while Spark tasks in fine-grained will relinquish cores as
> they terminate, they will not relinquish memory, as the JVM does not give
> memory back to the Operating System. Neither will executors terminate when
> they're idle.


 You can follow some of the recommendations Spark has in that document
 for sharing resources, when using Mesos.

 On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
 rahulpala...@gmail.com> wrote:

> Hi,
>
> Our team has been tackling multi-tenancy related issues with Mesos for
> quite some time.
>
> The problem is that tasks aren't being allocated properly when
> multiple applications are trying to launch a job. If we launch application
> A, and soon after application B, application B waits pretty much till the
> completion of application A for tasks to even be staged in Mesos. Right 
> now
> these applications are the spark-shell or the zeppelin interpreter.
>
> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
> different spark-shells results in the issue we're observing. One of the
> counts waits (in fact we don't even see the tasks being staged in mesos)
> until the current one finishes. This is the biggest issue we have been
> experience and any help or advice would be greatly appreciated. We want to
> be able to launch multiple jobs concurrently on our cluster and share
> resources appropriately.
>
> Another issue we see is that the java heap-space on the mesos executor
> backend process is not being cleaned up once a job has finished in the
> spark shell.
> I've attached a png file of the jvisualvm output showing that the
> heapspace is still allocated on a worker node. If I force the GC from
> jvisualvm then nearly all of that memory gets cleaned up. This may be
> because the spark-shell is still active - but if we've waited long enough
> why doesn't GC just clean up the space? However, even after forcing GC the
> mesos UI shows us that these resources are still being used.
> There should be a way to bring down the memory utilization of the
> executors once a task is finished. It shouldn't continue to have that
> memory allocated, even if a spark-shell is active on the driver.
>
> We have mesos configured to use fine-grained mode.
> The following are parameters we have set in our spark-defaults.conf
> file.
>
>
> spark.eventLog.enabled   true
> spark.eventLog.dir   hdfs://frontend-system:8090/directory
> 
> spark.local.dir/data/cluster-local/SPARK_TMP
>
> spark.executor.memory50g
>
> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>>>

Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread Rahul Palamuttam
Thanks David.
We will definitely take a look at Cook.

I am curious by what you mean by true multi-tenancy.

Under coarse-grained mode with dynamic allocation enabled - what I see in
the mesos UI is that there are 3 tasks running by default (one on each of
the nodes nodes we have).
I also see the coarsegrainedexecutors being brought up.

*Another point is that I always see a spark-submit command being launched
even if I kill that command it comes back up and the exectors get
reallocated on the worker nodes.
However, I am able to launch multiple spark shells and have jobs run
concurrently - which we were very happy with.
Unfortunately, I don't understand why mesos only shows 3 tasks running. I
even see the spike in thread count when launching my jobs, but the task
count remains unchanged.
The mesos logs does show jobs coming in.
The three tasks just sit there in the webui - running.

Is this what is expected?
Does running coarsegrained with dynamic allocation make mesos look at each
running executor as a different task?




On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg 
wrote:

> You could also check out Cook from twosigma. It's open source on github,
> and offers true preemptive multitenancy with spark on Mesos, by
> intermediating the spark drivers to optimize the cluster overall.
> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam 
> wrote:
>
>> Thank you Joseph.
>>
>> We'll try to explore coarse grained mode with dynamic allocation.
>>
>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu  wrote:
>>
>>> Looks like you're running Spark in "fine-grained" mode (deprecated).
>>>
>>> (The Spark website appears to be down right now, so here's the doc on
>>> Github:)
>>>
>>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>>>
>>> Note that while Spark tasks in fine-grained will relinquish cores as
 they terminate, they will not relinquish memory, as the JVM does not give
 memory back to the Operating System. Neither will executors terminate when
 they're idle.
>>>
>>>
>>> You can follow some of the recommendations Spark has in that document
>>> for sharing resources, when using Mesos.
>>>
>>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
>>> rahulpala...@gmail.com> wrote:
>>>
 Hi,

 Our team has been tackling multi-tenancy related issues with Mesos for
 quite some time.

 The problem is that tasks aren't being allocated properly when multiple
 applications are trying to launch a job. If we launch application A, and
 soon after application B, application B waits pretty much till the
 completion of application A for tasks to even be staged in Mesos. Right now
 these applications are the spark-shell or the zeppelin interpreter.

 Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
 different spark-shells results in the issue we're observing. One of the
 counts waits (in fact we don't even see the tasks being staged in mesos)
 until the current one finishes. This is the biggest issue we have been
 experience and any help or advice would be greatly appreciated. We want to
 be able to launch multiple jobs concurrently on our cluster and share
 resources appropriately.

 Another issue we see is that the java heap-space on the mesos executor
 backend process is not being cleaned up once a job has finished in the
 spark shell.
 I've attached a png file of the jvisualvm output showing that the
 heapspace is still allocated on a worker node. If I force the GC from
 jvisualvm then nearly all of that memory gets cleaned up. This may be
 because the spark-shell is still active - but if we've waited long enough
 why doesn't GC just clean up the space? However, even after forcing GC the
 mesos UI shows us that these resources are still being used.
 There should be a way to bring down the memory utilization of the
 executors once a task is finished. It shouldn't continue to have that
 memory allocated, even if a spark-shell is active on the driver.

 We have mesos configured to use fine-grained mode.
 The following are parameters we have set in our spark-defaults.conf
 file.


 spark.eventLog.enabled   true
 spark.eventLog.dir   hdfs://frontend-system:8090/directory
 
 spark.local.dir/data/cluster-local/SPARK_TMP

 spark.executor.memory50g

 spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
 spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
 spark.executor.uri  hdfs://frontend-system
 :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
 
 spark.mesos.coarse  false

 Please let me know if there are any questions about our configuration.
 Any advice or exper

Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread David Greenberg
You could also check out Cook from twosigma. It's open source on github,
and offers true preemptive multitenancy with spark on Mesos, by
intermediating the spark drivers to optimize the cluster overall.
On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam 
wrote:

> Thank you Joseph.
>
> We'll try to explore coarse grained mode with dynamic allocation.
>
> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu  wrote:
>
>> Looks like you're running Spark in "fine-grained" mode (deprecated).
>>
>> (The Spark website appears to be down right now, so here's the doc on
>> Github:)
>>
>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>>
>> Note that while Spark tasks in fine-grained will relinquish cores as they
>>> terminate, they will not relinquish memory, as the JVM does not give memory
>>> back to the Operating System. Neither will executors terminate when they're
>>> idle.
>>
>>
>> You can follow some of the recommendations Spark has in that document for
>> sharing resources, when using Mesos.
>>
>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
>> rahulpala...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Our team has been tackling multi-tenancy related issues with Mesos for
>>> quite some time.
>>>
>>> The problem is that tasks aren't being allocated properly when multiple
>>> applications are trying to launch a job. If we launch application A, and
>>> soon after application B, application B waits pretty much till the
>>> completion of application A for tasks to even be staged in Mesos. Right now
>>> these applications are the spark-shell or the zeppelin interpreter.
>>>
>>> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
>>> different spark-shells results in the issue we're observing. One of the
>>> counts waits (in fact we don't even see the tasks being staged in mesos)
>>> until the current one finishes. This is the biggest issue we have been
>>> experience and any help or advice would be greatly appreciated. We want to
>>> be able to launch multiple jobs concurrently on our cluster and share
>>> resources appropriately.
>>>
>>> Another issue we see is that the java heap-space on the mesos executor
>>> backend process is not being cleaned up once a job has finished in the
>>> spark shell.
>>> I've attached a png file of the jvisualvm output showing that the
>>> heapspace is still allocated on a worker node. If I force the GC from
>>> jvisualvm then nearly all of that memory gets cleaned up. This may be
>>> because the spark-shell is still active - but if we've waited long enough
>>> why doesn't GC just clean up the space? However, even after forcing GC the
>>> mesos UI shows us that these resources are still being used.
>>> There should be a way to bring down the memory utilization of the
>>> executors once a task is finished. It shouldn't continue to have that
>>> memory allocated, even if a spark-shell is active on the driver.
>>>
>>> We have mesos configured to use fine-grained mode.
>>> The following are parameters we have set in our spark-defaults.conf file.
>>>
>>>
>>> spark.eventLog.enabled   true
>>> spark.eventLog.dir   hdfs://frontend-system:8090/directory
>>> 
>>> spark.local.dir/data/cluster-local/SPARK_TMP
>>>
>>> spark.executor.memory50g
>>>
>>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>>> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
>>> spark.executor.uri  hdfs://frontend-system
>>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
>>> 
>>> spark.mesos.coarse  false
>>>
>>> Please let me know if there are any questions about our configuration.
>>> Any advice or experience the mesos community can share pertaining to
>>> issues with fine-grained mode would be greatly appreciated!
>>>
>>> I would also like to sincerely apologize for my previous test message on
>>> the mailing list.
>>> It was an ill-conceived idea since we are in a bit of a time crunch and
>>> I needed to get this message posted. I forgot I needed to send reply on to
>>> the user-subscribers email for me to be listed, resulting in message not
>>> sent emails. I will not do that again.
>>>
>>> Thanks,
>>>
>>> Rahul Palamuttam
>>>
>>
>>
>


Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread Rahul Palamuttam
Thank you Joseph.

We'll try to explore coarse grained mode with dynamic allocation.

On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu  wrote:

> Looks like you're running Spark in "fine-grained" mode (deprecated).
>
> (The Spark website appears to be down right now, so here's the doc on
> Github:)
>
> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>
> Note that while Spark tasks in fine-grained will relinquish cores as they
>> terminate, they will not relinquish memory, as the JVM does not give memory
>> back to the Operating System. Neither will executors terminate when they're
>> idle.
>
>
> You can follow some of the recommendations Spark has in that document for
> sharing resources, when using Mesos.
>
> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam  > wrote:
>
>> Hi,
>>
>> Our team has been tackling multi-tenancy related issues with Mesos for
>> quite some time.
>>
>> The problem is that tasks aren't being allocated properly when multiple
>> applications are trying to launch a job. If we launch application A, and
>> soon after application B, application B waits pretty much till the
>> completion of application A for tasks to even be staged in Mesos. Right now
>> these applications are the spark-shell or the zeppelin interpreter.
>>
>> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
>> different spark-shells results in the issue we're observing. One of the
>> counts waits (in fact we don't even see the tasks being staged in mesos)
>> until the current one finishes. This is the biggest issue we have been
>> experience and any help or advice would be greatly appreciated. We want to
>> be able to launch multiple jobs concurrently on our cluster and share
>> resources appropriately.
>>
>> Another issue we see is that the java heap-space on the mesos executor
>> backend process is not being cleaned up once a job has finished in the
>> spark shell.
>> I've attached a png file of the jvisualvm output showing that the
>> heapspace is still allocated on a worker node. If I force the GC from
>> jvisualvm then nearly all of that memory gets cleaned up. This may be
>> because the spark-shell is still active - but if we've waited long enough
>> why doesn't GC just clean up the space? However, even after forcing GC the
>> mesos UI shows us that these resources are still being used.
>> There should be a way to bring down the memory utilization of the
>> executors once a task is finished. It shouldn't continue to have that
>> memory allocated, even if a spark-shell is active on the driver.
>>
>> We have mesos configured to use fine-grained mode.
>> The following are parameters we have set in our spark-defaults.conf file.
>>
>>
>> spark.eventLog.enabled   true
>> spark.eventLog.dir   hdfs://frontend-system:8090/directory
>> 
>> spark.local.dir/data/cluster-local/SPARK_TMP
>>
>> spark.executor.memory50g
>>
>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
>> spark.executor.uri  hdfs://frontend-system
>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
>> 
>> spark.mesos.coarse  false
>>
>> Please let me know if there are any questions about our configuration.
>> Any advice or experience the mesos community can share pertaining to
>> issues with fine-grained mode would be greatly appreciated!
>>
>> I would also like to sincerely apologize for my previous test message on
>> the mailing list.
>> It was an ill-conceived idea since we are in a bit of a time crunch and I
>> needed to get this message posted. I forgot I needed to send reply on to
>> the user-subscribers email for me to be listed, resulting in message not
>> sent emails. I will not do that again.
>>
>> Thanks,
>>
>> Rahul Palamuttam
>>
>
>


Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread Joseph Wu
Looks like you're running Spark in "fine-grained" mode (deprecated).

(The Spark website appears to be down right now, so here's the doc on
Github:)
https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated

Note that while Spark tasks in fine-grained will relinquish cores as they
> terminate, they will not relinquish memory, as the JVM does not give memory
> back to the Operating System. Neither will executors terminate when they're
> idle.


You can follow some of the recommendations Spark has in that document for
sharing resources, when using Mesos.

On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam 
wrote:

> Hi,
>
> Our team has been tackling multi-tenancy related issues with Mesos for
> quite some time.
>
> The problem is that tasks aren't being allocated properly when multiple
> applications are trying to launch a job. If we launch application A, and
> soon after application B, application B waits pretty much till the
> completion of application A for tasks to even be staged in Mesos. Right now
> these applications are the spark-shell or the zeppelin interpreter.
>
> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
> different spark-shells results in the issue we're observing. One of the
> counts waits (in fact we don't even see the tasks being staged in mesos)
> until the current one finishes. This is the biggest issue we have been
> experience and any help or advice would be greatly appreciated. We want to
> be able to launch multiple jobs concurrently on our cluster and share
> resources appropriately.
>
> Another issue we see is that the java heap-space on the mesos executor
> backend process is not being cleaned up once a job has finished in the
> spark shell.
> I've attached a png file of the jvisualvm output showing that the
> heapspace is still allocated on a worker node. If I force the GC from
> jvisualvm then nearly all of that memory gets cleaned up. This may be
> because the spark-shell is still active - but if we've waited long enough
> why doesn't GC just clean up the space? However, even after forcing GC the
> mesos UI shows us that these resources are still being used.
> There should be a way to bring down the memory utilization of the
> executors once a task is finished. It shouldn't continue to have that
> memory allocated, even if a spark-shell is active on the driver.
>
> We have mesos configured to use fine-grained mode.
> The following are parameters we have set in our spark-defaults.conf file.
>
>
> spark.eventLog.enabled   true
> spark.eventLog.dir   hdfs://frontend-system:8090/directory
> 
> spark.local.dir/data/cluster-local/SPARK_TMP
>
> spark.executor.memory50g
>
> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
> spark.executor.uri  hdfs://frontend-system
> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
> 
> spark.mesos.coarse  false
>
> Please let me know if there are any questions about our configuration.
> Any advice or experience the mesos community can share pertaining to
> issues with fine-grained mode would be greatly appreciated!
>
> I would also like to sincerely apologize for my previous test message on
> the mailing list.
> It was an ill-conceived idea since we are in a bit of a time crunch and I
> needed to get this message posted. I forgot I needed to send reply on to
> the user-subscribers email for me to be listed, resulting in message not
> sent emails. I will not do that again.
>
> Thanks,
>
> Rahul Palamuttam
>


Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread Rahul Palamuttam
Hi,

Our team has been tackling multi-tenancy related issues with Mesos for
quite some time.

The problem is that tasks aren't being allocated properly when multiple
applications are trying to launch a job. If we launch application A, and
soon after application B, application B waits pretty much till the
completion of application A for tasks to even be staged in Mesos. Right now
these applications are the spark-shell or the zeppelin interpreter.

Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
different spark-shells results in the issue we're observing. One of the
counts waits (in fact we don't even see the tasks being staged in mesos)
until the current one finishes. This is the biggest issue we have been
experience and any help or advice would be greatly appreciated. We want to
be able to launch multiple jobs concurrently on our cluster and share
resources appropriately.

Another issue we see is that the java heap-space on the mesos executor
backend process is not being cleaned up once a job has finished in the
spark shell.
I've attached a png file of the jvisualvm output showing that the heapspace
is still allocated on a worker node. If I force the GC from jvisualvm then
nearly all of that memory gets cleaned up. This may be because the
spark-shell is still active - but if we've waited long enough why doesn't
GC just clean up the space? However, even after forcing GC the mesos UI
shows us that these resources are still being used.
There should be a way to bring down the memory utilization of the executors
once a task is finished. It shouldn't continue to have that memory
allocated, even if a spark-shell is active on the driver.

We have mesos configured to use fine-grained mode.
The following are parameters we have set in our spark-defaults.conf file.


spark.eventLog.enabled   true
spark.eventLog.dir   hdfs://frontend-system:8090/directory

spark.local.dir/data/cluster-local/SPARK_TMP

spark.executor.memory50g

spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
spark.executor.uri  hdfs://frontend-system
:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz

spark.mesos.coarse  false

Please let me know if there are any questions about our configuration.
Any advice or experience the mesos community can share pertaining to issues
with fine-grained mode would be greatly appreciated!

I would also like to sincerely apologize for my previous test message on
the mailing list.
It was an ill-conceived idea since we are in a bit of a time crunch and I
needed to get this message posted. I forgot I needed to send reply on to
the user-subscribers email for me to be listed, resulting in message not
sent emails. I will not do that again.

Thanks,

Rahul Palamuttam