Re: Initial Design Document Apache Mesos Federation (JIRA 3548)

2016-07-13 Thread Jeff Schroeder
Would this mean introducing golang and as a result, consul, into mesos
proper? Seems like a bit of an odd dependency when everything currently
uses existing ASF projects.

On Wed, Jul 13, 2016 at 5:11 PM, DhilipKumar Sankaranarayanan <
s.dhilipku...@gmail.com> wrote:

> Hi All,
>
> Please find the initial version of the Design Document
> 
> for Federating Mesos Clusters.
>
>
> https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit?usp=sharing
>
> We at Huawei had been working on this federation project for the past few
> months.  We also got an opportunity to present this in recent MesosCon
> 2016. From further discussions and feedback we have received so far, we
> have greatly simplified the design.
>
> Also I see that no one assigned to this JIRA now could i get that assigned
> to myself ? It would be great to know if there is anyone willing to
> shepherd this too.
>
> I would also like to bring this up in the community Sync that happens
> tomorrow.
>
> We would love to hear your thoughts. We will be glad to see collaborate
> with you in the implementation.
>
> Regards,
> Dhilip
>
>
> Reference:
> JIRA: https://issues.apache.org/jira/browse/MESOS-3548
> Slides:
> http://www.slideshare.net/mKrishnaKumar1/federated-mesos-clusters-for-global-data-center-designs
> Video :
> https://www.youtube.com/watch?v=kqyVQzwwD5E=17=PLGeM09tlguZQVL7ZsfNMffX9h1rGNVqnC
>
>


-- 
Jeff Schroeder

Don't drink and derive, alcohol and analysis don't mix.
http://www.digitalprognosis.com


Re: Initial Design Document Apache Mesos Federation (JIRA 3548)

2016-07-13 Thread Alexander Gallego
This is very cool work, i had a chat w/ another company thinking about
doing the exact same thing.

I think the proposal is missing several details that make it hard to
evaluate on paper (also saw your presentation).


1) Failure semantics, seem to be the same from the proposed design.


As a framework author, how do you suggest you deal w/ tasks on multiple
clusters, i.e.: i feel like there have to be richer semantics about the
task at least on the mesos.proto level where the state is
STATUS_FAILED_DC_OUTAGE or smth along those lines.

We respawn operators and having this information may allow me as a
framework author to wait a little longer before trying to declare that task
as dead (KILLED/FAILED/LOST) if I spawn it on a different data center.

Would love to get details on how you were thinking of extending the failure
semantics for multi datacenters.


2) Can you share more details about the allocator modules.


After reading the proposal, I anderstand it as follows.


[ gossiper ] -> [ allocator module ] -> [mesos master]


Is this correct ? if so, you are saying that you can tell the mesos master
to run a task  that was fulfilled by a framework on a different data
center?

Is the constraint that you are forced to run a scheduler per framework on
each data center?



3) High availability


High availability on a multi dc layout means something entirely different.
So are all frameworks now on standby on every other cluster? the problem i
see with this is that the metadata stored by each framework to support HA
now has to spans multiple DC's. It would be nice to perhaps at the mesos
level extend/expose an API for setting state.

a) On the normal mesos layout, this key=value data store would be
zookeeper.

b) On the multi dc layout it could be zookeeper per data center but then
one can piggy back on the gossiper to replicate that state in the other
data centers.


4) Metrics / Monitoring - probably down the line, but would be good to also
piggy back some of the mesos master endpoints
through the gossip architecture.



Again very cool work, would love to get some more details on the actual
implementation that you built plus some of the points above.

- Alex







On Wed, Jul 13, 2016 at 6:11 PM, DhilipKumar Sankaranarayanan <
s.dhilipku...@gmail.com> wrote:

> Hi All,
>
> Please find the initial version of the Design Document
> 
> for Federating Mesos Clusters.
>
>
> https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit?usp=sharing
>
> We at Huawei had been working on this federation project for the past few
> months.  We also got an opportunity to present this in recent MesosCon
> 2016. From further discussions and feedback we have received so far, we
> have greatly simplified the design.
>
> Also I see that no one assigned to this JIRA now could i get that assigned
> to myself ? It would be great to know if there is anyone willing to
> shepherd this too.
>
> I would also like to bring this up in the community Sync that happens
> tomorrow.
>
> We would love to hear your thoughts. We will be glad to see collaborate
> with you in the implementation.
>
> Regards,
> Dhilip
>
>
> Reference:
> JIRA: https://issues.apache.org/jira/browse/MESOS-3548
> Slides:
> http://www.slideshare.net/mKrishnaKumar1/federated-mesos-clusters-for-global-data-center-designs
> Video :
> https://www.youtube.com/watch?v=kqyVQzwwD5E=17=PLGeM09tlguZQVL7ZsfNMffX9h1rGNVqnC
>
>


-- 





Alexander Gallego
Co-Founder & CTO


Re: What's the official pronounce of mesos?

2016-07-13 Thread Paul
Sadly, I don't understand a whole lot about Mesos, but I did learn Ancient 
Greek in college, taught it for a couple of years, and have even translated 
parts of Homer's Iliad.

 μέσος

The 'e' (epsilon) in 'Mesos' would be pronounced like the 'e' in the English 
word 'pet'. The 'o' (omicron) as in 'hot'.

But, at least to English ears, that pronunciation feels a bit stilted. So I 
think Rodrick's right to sound the 'o' as long, as in 'tone'.

-Paul

> On Jul 13, 2016, at 9:12 PM, Rodrick Brown  wrote:
> 
> Mess-O's 
> 
> Get Outlook for iOS
> 
> 
> 
> 
> On Wed, Jul 13, 2016 at 7:56 PM -0400, "zhiwei"  wrote:
> 
>> Hi,
>> 
>> I saw in some videos, different people pronounce 'mesos' differently.
>> 
>> Can someone add the official pronounce of mesos to wikipedia?
> 
> NOTICE TO RECIPIENTS: This communication is confidential and intended for the 
> use of the addressee only. If you are not an intended recipient of this 
> communication, please delete it immediately and notify the sender by return 
> email. Unauthorized reading, dissemination, distribution or copying of this 
> communication is prohibited. This communication does not constitute an offer 
> to sell or a solicitation of an indication of interest to purchase any loan, 
> security or any other financial product or instrument, nor is it an offer to 
> sell or a solicitation of an indication of interest to purchase any products 
> or services to any persons who are prohibited from receiving such information 
> under applicable law. The contents of this communication may not be accurate 
> or complete and are subject to change without notice. As such, Orchard App, 
> Inc. (including its subsidiaries and affiliates, "Orchard") makes no 
> representation regarding the accuracy or completeness of the information 
> contained herein. The intended recipient is advised to consult its own 
> professional advisors, including those specializing in legal, tax and 
> accounting matters. Orchard does not provide legal, tax or accounting advice.


Re: OS X latency issue when run as a plist

2016-07-13 Thread Rinaldo Digiorgio

> On Jul 13, 2016, at 9:20 PM, Rodrick Brown  wrote:
> 
> Have you tried using something like supervisord? Or the slew of other process 
> launchers available for *nix. 
> 
Thanks I had no idea that there were alternatives to launchd. Thanks will look 
into it and report back for the next person.
> Check brew.
> 
> I would look to that as an interim solution if the plist method remains 
> problematic. 
> 
> Get Outlook for iOS 
> 
> 
> 
> On Wed, Jul 13, 2016 at 7:44 AM -0400, "Rinaldo Digiorgio" 
> > wrote:
> 
> Hi,
> 
>   There have been prior discussions on the list about the OS X Latency 
> issue. I had filed a bug here:
> 
>   https://issues.apache.org/jira/browse/MESOS-5589 
> 
> 
>   We have found that the root cause is starting the mesos application in 
> the background using a plist entry.  If you launch the mesos agent from a 
> terminal it works fine.  We have tried to get a plist (not an app) to work 
> and none of the documented settings in launchd remove the latency issue.
> 
>   
> https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man5/launchd.plist.5.html
>  
> 
> 
>   The settings we tried are:
> 
>ProcessType 
>  This optional key describes, at a high level, the intended purpose of 
> the job.  The system will apply
>  resource limits based on what kind of job it is. If left unspecified, 
> the system will apply light
>  resource limits to the job, throttling its CPU usage and I/O bandwidth. 
> The following are valid values:
> 
>Background
>Background jobs are generally processes that do work that was not 
> directly requested by the user.
>The resource limits applied to Background jobs are intended to 
> prevent them from disrupting the
>user experience.
> 
>Standard
>Standard jobs are equivalent to no ProcessType being set.
> 
>Adaptive
>Adaptive jobs move between the Background and Interactive 
> classifications based on activity over
>XPC connections. See xpc_transaction_begin(3) 
> 
>  for details.
> 
>Interactive
>Interactive jobs run with the same resource limitations as apps, 
> that is to say, none. Interac-tive Interactive
>tive jobs are critical to maintaining a responsive user 
> experience, and this key should only be
>used if an app's ability to be responsive depends on it, and 
> cannot be made Adaptive.
> 
> 
> The mesos agent works correctly if you start it as a GUI app. This leaves an 
> icon on the screen. One can live with it but it is an indication of the lack 
> of proper documentation from apple and or utter lack of understanding of 
> background application on the Desktop OS known as OS X.  If someone has a 
> plist solution please share it. It is not reasonable to start mesos agents 
> from a terminal session or cron, the operating system should manage startup 
> and shutdown.
> 
> Rinaldo
> 
> 
> 
>   
> 
> NOTICE TO RECIPIENTS: This communication is confidential and intended for the 
> use of the addressee only. If you are not an intended recipient of this 
> communication, please delete it immediately and notify the sender by return 
> email. Unauthorized reading, dissemination, distribution or copying of this 
> communication is prohibited. This communication does not constitute an offer 
> to sell or a solicitation of an indication of interest to purchase any loan, 
> security or any other financial product or instrument, nor is it an offer to 
> sell or a solicitation of an indication of interest to purchase any products 
> or services to any persons who are prohibited from receiving such information 
> under applicable law. The contents of this communication may not be accurate 
> or complete and are subject to change without notice. As such, Orchard App, 
> Inc. (including its subsidiaries and affiliates, "Orchard") makes no 
> representation regarding the accuracy or completeness of the information 
> contained herein. The intended recipient is advised to consult its own 
> professional advisors, including those specializing in legal, tax and 
> accounting matters. Orchard does not provide legal, tax or accounting advice.



Re: OS X latency issue when run as a plist

2016-07-13 Thread Rodrick Brown
Have you tried using something like supervisord? Or the slew of other process 
launchers available for *nix. 
Check brew.
I would look to that as an interim solution if the plist method remains 
problematic. 

Get Outlook for iOS




On Wed, Jul 13, 2016 at 7:44 AM -0400, "Rinaldo Digiorgio" 
 wrote:










Hi,
There have been prior discussions on the list about the OS X Latency 
issue. I had filed a bug here:
https://issues.apache.org/jira/browse/MESOS-5589
We have found that the root cause is starting the mesos application in 
the background using a plist entry.  If you launch the mesos agent from a 
terminal it works fine.  We have tried to get a plist (not an app) to work and 
none of the documented settings in launchd remove the latency issue.

https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man5/launchd.plist.5.html
The settings we tried are:
 ProcessType  This optional key describes, at a high level, 
the intended purpose of the job.  The system will apply
 resource limits based on what kind of job it is. If left unspecified, the 
system will apply light
 resource limits to the job, throttling its CPU usage and I/O bandwidth. 
The following are valid values:

   Background
   Background jobs are generally processes that do work that was not 
directly requested by the user.
   The resource limits applied to Background jobs are intended to 
prevent them from disrupting the
   user experience.

   Standard
   Standard jobs are equivalent to no ProcessType being set.

   Adaptive
   Adaptive jobs move between the Background and Interactive 
classifications based on activity over
   XPC connections. See xpc_transaction_begin(3) for details.

   Interactive
   Interactive jobs run with the same resource limitations as apps, 
that is to say, none. Interac-tive Interactive
   tive jobs are critical to maintaining a responsive user experience, 
and this key should only be
   used if an app's ability to be responsive depends on it, and cannot 
be made Adaptive.


The mesos agent works correctly if you start it as a GUI app. This leaves an 
icon on the screen. One can live with it but it is an indication of the lack of 
proper documentation from apple and or utter lack of understanding of 
background application on the Desktop OS known as OS X.  If someone has a plist 
solution please share it. It is not reasonable to start mesos agents from a 
terminal session or cron, the operating system should manage startup and 
shutdown.
Rinaldo








-- 
*NOTICE TO RECIPIENTS*: This communication is confidential and intended for 
the use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 
email. Unauthorized reading, dissemination, distribution or copying of this 
communication is prohibited. This communication does not constitute an 
offer to sell or a solicitation of an indication of interest to purchase 
any loan, security or any other financial product or instrument, nor is it 
an offer to sell or a solicitation of an indication of interest to purchase 
any products or services to any persons who are prohibited from receiving 
such information under applicable law. The contents of this communication 
may not be accurate or complete and are subject to change without notice. 
As such, Orchard App, Inc. (including its subsidiaries and affiliates, 
"Orchard") makes no representation regarding the accuracy or completeness 
of the information contained herein. The intended recipient is advised to 
consult its own professional advisors, including those specializing in 
legal, tax and accounting matters. Orchard does not provide legal, tax or 
accounting advice.


Re: What's the official pronounce of mesos?

2016-07-13 Thread Rodrick Brown
Mess-O's 

Get Outlook for iOS




On Wed, Jul 13, 2016 at 7:56 PM -0400, "zhiwei"  wrote:












Hi,


I saw in some videos, different people pronounce 'mesos' differently.


Can someone add the official pronounce of mesos to wikipedia?







-- 
*NOTICE TO RECIPIENTS*: This communication is confidential and intended for 
the use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 
email. Unauthorized reading, dissemination, distribution or copying of this 
communication is prohibited. This communication does not constitute an 
offer to sell or a solicitation of an indication of interest to purchase 
any loan, security or any other financial product or instrument, nor is it 
an offer to sell or a solicitation of an indication of interest to purchase 
any products or services to any persons who are prohibited from receiving 
such information under applicable law. The contents of this communication 
may not be accurate or complete and are subject to change without notice. 
As such, Orchard App, Inc. (including its subsidiaries and affiliates, 
"Orchard") makes no representation regarding the accuracy or completeness 
of the information contained herein. The intended recipient is advised to 
consult its own professional advisors, including those specializing in 
legal, tax and accounting matters. Orchard does not provide legal, tax or 
accounting advice.


Re: What's the official pronounce of mesos?

2016-07-13 Thread Dario Rexin
Mesos comes from the Greek word ‘mesos’, which means ‘middle’, so I guess that 
would be a valid pronunciation ;).

> On Jul 13, 2016, at 5:18 PM, Jie Yu  wrote:
> 
> Looks like we don't have an official pronunciation. The previous guideline 
> was "say it how you like it".
> 
> Also, I checked some other project like kubernetes wiki 
> , they don't have an official 
> pronunciation as well.
> 
> - Jie
> 
> On Wed, Jul 13, 2016 at 4:56 PM, zhiwei  > wrote:
> Hi,
> 
> I saw in some videos, different people pronounce 'mesos' differently.
> 
> Can someone add the official pronounce of mesos to wikipedia?
> 
> 



Re: What's the official pronounce of mesos?

2016-07-13 Thread Jie Yu
Looks like we don't have an official pronunciation. The previous guideline
was "say it how you like it".

Also, I checked some other project like kubernetes wiki
, they don't have an official
pronunciation as well.

- Jie

On Wed, Jul 13, 2016 at 4:56 PM, zhiwei  wrote:

> Hi,
>
> I saw in some videos, different people pronounce 'mesos' differently.
>
> Can someone add the official pronounce of mesos to wikipedia?
>


Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread Rahul Palamuttam
Thanks David.
We will definitely take a look at Cook.

I am curious by what you mean by true multi-tenancy.

Under coarse-grained mode with dynamic allocation enabled - what I see in
the mesos UI is that there are 3 tasks running by default (one on each of
the nodes nodes we have).
I also see the coarsegrainedexecutors being brought up.

*Another point is that I always see a spark-submit command being launched
even if I kill that command it comes back up and the exectors get
reallocated on the worker nodes.
However, I am able to launch multiple spark shells and have jobs run
concurrently - which we were very happy with.
Unfortunately, I don't understand why mesos only shows 3 tasks running. I
even see the spike in thread count when launching my jobs, but the task
count remains unchanged.
The mesos logs does show jobs coming in.
The three tasks just sit there in the webui - running.

Is this what is expected?
Does running coarsegrained with dynamic allocation make mesos look at each
running executor as a different task?




On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg 
wrote:

> You could also check out Cook from twosigma. It's open source on github,
> and offers true preemptive multitenancy with spark on Mesos, by
> intermediating the spark drivers to optimize the cluster overall.
> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam 
> wrote:
>
>> Thank you Joseph.
>>
>> We'll try to explore coarse grained mode with dynamic allocation.
>>
>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu  wrote:
>>
>>> Looks like you're running Spark in "fine-grained" mode (deprecated).
>>>
>>> (The Spark website appears to be down right now, so here's the doc on
>>> Github:)
>>>
>>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>>>
>>> Note that while Spark tasks in fine-grained will relinquish cores as
 they terminate, they will not relinquish memory, as the JVM does not give
 memory back to the Operating System. Neither will executors terminate when
 they're idle.
>>>
>>>
>>> You can follow some of the recommendations Spark has in that document
>>> for sharing resources, when using Mesos.
>>>
>>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
>>> rahulpala...@gmail.com> wrote:
>>>
 Hi,

 Our team has been tackling multi-tenancy related issues with Mesos for
 quite some time.

 The problem is that tasks aren't being allocated properly when multiple
 applications are trying to launch a job. If we launch application A, and
 soon after application B, application B waits pretty much till the
 completion of application A for tasks to even be staged in Mesos. Right now
 these applications are the spark-shell or the zeppelin interpreter.

 Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
 different spark-shells results in the issue we're observing. One of the
 counts waits (in fact we don't even see the tasks being staged in mesos)
 until the current one finishes. This is the biggest issue we have been
 experience and any help or advice would be greatly appreciated. We want to
 be able to launch multiple jobs concurrently on our cluster and share
 resources appropriately.

 Another issue we see is that the java heap-space on the mesos executor
 backend process is not being cleaned up once a job has finished in the
 spark shell.
 I've attached a png file of the jvisualvm output showing that the
 heapspace is still allocated on a worker node. If I force the GC from
 jvisualvm then nearly all of that memory gets cleaned up. This may be
 because the spark-shell is still active - but if we've waited long enough
 why doesn't GC just clean up the space? However, even after forcing GC the
 mesos UI shows us that these resources are still being used.
 There should be a way to bring down the memory utilization of the
 executors once a task is finished. It shouldn't continue to have that
 memory allocated, even if a spark-shell is active on the driver.

 We have mesos configured to use fine-grained mode.
 The following are parameters we have set in our spark-defaults.conf
 file.


 spark.eventLog.enabled   true
 spark.eventLog.dir   hdfs://frontend-system:8090/directory
 
 spark.local.dir/data/cluster-local/SPARK_TMP

 spark.executor.memory50g

 spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
 spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
 spark.executor.uri  hdfs://frontend-system
 :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
 
 spark.mesos.coarse  false

 Please let me know if 

What's the official pronounce of mesos?

2016-07-13 Thread zhiwei
Hi,

I saw in some videos, different people pronounce 'mesos' differently.

Can someone add the official pronounce of mesos to wikipedia?


Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread David Greenberg
You could also check out Cook from twosigma. It's open source on github,
and offers true preemptive multitenancy with spark on Mesos, by
intermediating the spark drivers to optimize the cluster overall.
On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam 
wrote:

> Thank you Joseph.
>
> We'll try to explore coarse grained mode with dynamic allocation.
>
> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu  wrote:
>
>> Looks like you're running Spark in "fine-grained" mode (deprecated).
>>
>> (The Spark website appears to be down right now, so here's the doc on
>> Github:)
>>
>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>>
>> Note that while Spark tasks in fine-grained will relinquish cores as they
>>> terminate, they will not relinquish memory, as the JVM does not give memory
>>> back to the Operating System. Neither will executors terminate when they're
>>> idle.
>>
>>
>> You can follow some of the recommendations Spark has in that document for
>> sharing resources, when using Mesos.
>>
>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
>> rahulpala...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Our team has been tackling multi-tenancy related issues with Mesos for
>>> quite some time.
>>>
>>> The problem is that tasks aren't being allocated properly when multiple
>>> applications are trying to launch a job. If we launch application A, and
>>> soon after application B, application B waits pretty much till the
>>> completion of application A for tasks to even be staged in Mesos. Right now
>>> these applications are the spark-shell or the zeppelin interpreter.
>>>
>>> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
>>> different spark-shells results in the issue we're observing. One of the
>>> counts waits (in fact we don't even see the tasks being staged in mesos)
>>> until the current one finishes. This is the biggest issue we have been
>>> experience and any help or advice would be greatly appreciated. We want to
>>> be able to launch multiple jobs concurrently on our cluster and share
>>> resources appropriately.
>>>
>>> Another issue we see is that the java heap-space on the mesos executor
>>> backend process is not being cleaned up once a job has finished in the
>>> spark shell.
>>> I've attached a png file of the jvisualvm output showing that the
>>> heapspace is still allocated on a worker node. If I force the GC from
>>> jvisualvm then nearly all of that memory gets cleaned up. This may be
>>> because the spark-shell is still active - but if we've waited long enough
>>> why doesn't GC just clean up the space? However, even after forcing GC the
>>> mesos UI shows us that these resources are still being used.
>>> There should be a way to bring down the memory utilization of the
>>> executors once a task is finished. It shouldn't continue to have that
>>> memory allocated, even if a spark-shell is active on the driver.
>>>
>>> We have mesos configured to use fine-grained mode.
>>> The following are parameters we have set in our spark-defaults.conf file.
>>>
>>>
>>> spark.eventLog.enabled   true
>>> spark.eventLog.dir   hdfs://frontend-system:8090/directory
>>> 
>>> spark.local.dir/data/cluster-local/SPARK_TMP
>>>
>>> spark.executor.memory50g
>>>
>>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>>> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
>>> spark.executor.uri  hdfs://frontend-system
>>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
>>> 
>>> spark.mesos.coarse  false
>>>
>>> Please let me know if there are any questions about our configuration.
>>> Any advice or experience the mesos community can share pertaining to
>>> issues with fine-grained mode would be greatly appreciated!
>>>
>>> I would also like to sincerely apologize for my previous test message on
>>> the mailing list.
>>> It was an ill-conceived idea since we are in a bit of a time crunch and
>>> I needed to get this message posted. I forgot I needed to send reply on to
>>> the user-subscribers email for me to be listed, resulting in message not
>>> sent emails. I will not do that again.
>>>
>>> Thanks,
>>>
>>> Rahul Palamuttam
>>>
>>
>>
>


Re: test

2016-07-13 Thread Vinod Kone
Don't sweat about the test email. Not a big deal. Welcome to the community!

On Wed, Jul 13, 2016 at 1:51 PM, Rahul Palamuttam 
wrote:

> I'm truly sorry.
> Just kept getting several message denied errors, until I realized I needed
> to send a reply to user-subscribe.
> I will not do that again.
>
>
> On Wed, Jul 13, 2016 at 11:57 AM, daemeon reiydelle 
> wrote:
>
>> Why are you wasting our time with this? Lame.
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Wed, Jul 13, 2016 at 11:56 AM, Rahul Palamuttam <
>> rahulpala...@gmail.com> wrote:
>>
>>>
>>>
>>
>


Initial Design Document Apache Mesos Federation (JIRA 3548)

2016-07-13 Thread DhilipKumar Sankaranarayanan
Hi All,

Please find the initial version of the Design Document

for Federating Mesos Clusters.

https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit?usp=sharing

We at Huawei had been working on this federation project for the past few
months.  We also got an opportunity to present this in recent MesosCon
2016. From further discussions and feedback we have received so far, we
have greatly simplified the design.

Also I see that no one assigned to this JIRA now could i get that assigned
to myself ? It would be great to know if there is anyone willing to
shepherd this too.

I would also like to bring this up in the community Sync that happens
tomorrow.

We would love to hear your thoughts. We will be glad to see collaborate
with you in the implementation.

Regards,
Dhilip


Reference:
JIRA: https://issues.apache.org/jira/browse/MESOS-3548
Slides:
http://www.slideshare.net/mKrishnaKumar1/federated-mesos-clusters-for-global-data-center-designs
Video :
https://www.youtube.com/watch?v=kqyVQzwwD5E=17=PLGeM09tlguZQVL7ZsfNMffX9h1rGNVqnC


Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread Rahul Palamuttam
Thank you Joseph.

We'll try to explore coarse grained mode with dynamic allocation.

On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu  wrote:

> Looks like you're running Spark in "fine-grained" mode (deprecated).
>
> (The Spark website appears to be down right now, so here's the doc on
> Github:)
>
> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>
> Note that while Spark tasks in fine-grained will relinquish cores as they
>> terminate, they will not relinquish memory, as the JVM does not give memory
>> back to the Operating System. Neither will executors terminate when they're
>> idle.
>
>
> You can follow some of the recommendations Spark has in that document for
> sharing resources, when using Mesos.
>
> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam  > wrote:
>
>> Hi,
>>
>> Our team has been tackling multi-tenancy related issues with Mesos for
>> quite some time.
>>
>> The problem is that tasks aren't being allocated properly when multiple
>> applications are trying to launch a job. If we launch application A, and
>> soon after application B, application B waits pretty much till the
>> completion of application A for tasks to even be staged in Mesos. Right now
>> these applications are the spark-shell or the zeppelin interpreter.
>>
>> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
>> different spark-shells results in the issue we're observing. One of the
>> counts waits (in fact we don't even see the tasks being staged in mesos)
>> until the current one finishes. This is the biggest issue we have been
>> experience and any help or advice would be greatly appreciated. We want to
>> be able to launch multiple jobs concurrently on our cluster and share
>> resources appropriately.
>>
>> Another issue we see is that the java heap-space on the mesos executor
>> backend process is not being cleaned up once a job has finished in the
>> spark shell.
>> I've attached a png file of the jvisualvm output showing that the
>> heapspace is still allocated on a worker node. If I force the GC from
>> jvisualvm then nearly all of that memory gets cleaned up. This may be
>> because the spark-shell is still active - but if we've waited long enough
>> why doesn't GC just clean up the space? However, even after forcing GC the
>> mesos UI shows us that these resources are still being used.
>> There should be a way to bring down the memory utilization of the
>> executors once a task is finished. It shouldn't continue to have that
>> memory allocated, even if a spark-shell is active on the driver.
>>
>> We have mesos configured to use fine-grained mode.
>> The following are parameters we have set in our spark-defaults.conf file.
>>
>>
>> spark.eventLog.enabled   true
>> spark.eventLog.dir   hdfs://frontend-system:8090/directory
>> 
>> spark.local.dir/data/cluster-local/SPARK_TMP
>>
>> spark.executor.memory50g
>>
>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
>> spark.executor.uri  hdfs://frontend-system
>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
>> 
>> spark.mesos.coarse  false
>>
>> Please let me know if there are any questions about our configuration.
>> Any advice or experience the mesos community can share pertaining to
>> issues with fine-grained mode would be greatly appreciated!
>>
>> I would also like to sincerely apologize for my previous test message on
>> the mailing list.
>> It was an ill-conceived idea since we are in a bit of a time crunch and I
>> needed to get this message posted. I forgot I needed to send reply on to
>> the user-subscribers email for me to be listed, resulting in message not
>> sent emails. I will not do that again.
>>
>> Thanks,
>>
>> Rahul Palamuttam
>>
>
>


Re: Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread Joseph Wu
Looks like you're running Spark in "fine-grained" mode (deprecated).

(The Spark website appears to be down right now, so here's the doc on
Github:)
https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated

Note that while Spark tasks in fine-grained will relinquish cores as they
> terminate, they will not relinquish memory, as the JVM does not give memory
> back to the Operating System. Neither will executors terminate when they're
> idle.


You can follow some of the recommendations Spark has in that document for
sharing resources, when using Mesos.

On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam 
wrote:

> Hi,
>
> Our team has been tackling multi-tenancy related issues with Mesos for
> quite some time.
>
> The problem is that tasks aren't being allocated properly when multiple
> applications are trying to launch a job. If we launch application A, and
> soon after application B, application B waits pretty much till the
> completion of application A for tasks to even be staged in Mesos. Right now
> these applications are the spark-shell or the zeppelin interpreter.
>
> Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
> different spark-shells results in the issue we're observing. One of the
> counts waits (in fact we don't even see the tasks being staged in mesos)
> until the current one finishes. This is the biggest issue we have been
> experience and any help or advice would be greatly appreciated. We want to
> be able to launch multiple jobs concurrently on our cluster and share
> resources appropriately.
>
> Another issue we see is that the java heap-space on the mesos executor
> backend process is not being cleaned up once a job has finished in the
> spark shell.
> I've attached a png file of the jvisualvm output showing that the
> heapspace is still allocated on a worker node. If I force the GC from
> jvisualvm then nearly all of that memory gets cleaned up. This may be
> because the spark-shell is still active - but if we've waited long enough
> why doesn't GC just clean up the space? However, even after forcing GC the
> mesos UI shows us that these resources are still being used.
> There should be a way to bring down the memory utilization of the
> executors once a task is finished. It shouldn't continue to have that
> memory allocated, even if a spark-shell is active on the driver.
>
> We have mesos configured to use fine-grained mode.
> The following are parameters we have set in our spark-defaults.conf file.
>
>
> spark.eventLog.enabled   true
> spark.eventLog.dir   hdfs://frontend-system:8090/directory
> 
> spark.local.dir/data/cluster-local/SPARK_TMP
>
> spark.executor.memory50g
>
> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
> spark.executor.uri  hdfs://frontend-system
> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
> 
> spark.mesos.coarse  false
>
> Please let me know if there are any questions about our configuration.
> Any advice or experience the mesos community can share pertaining to
> issues with fine-grained mode would be greatly appreciated!
>
> I would also like to sincerely apologize for my previous test message on
> the mailing list.
> It was an ill-conceived idea since we are in a bit of a time crunch and I
> needed to get this message posted. I forgot I needed to send reply on to
> the user-subscribers email for me to be listed, resulting in message not
> sent emails. I will not do that again.
>
> Thanks,
>
> Rahul Palamuttam
>


Mesos fine-grained multi-user mode failed to allocate tasks

2016-07-13 Thread Rahul Palamuttam
Hi,

Our team has been tackling multi-tenancy related issues with Mesos for
quite some time.

The problem is that tasks aren't being allocated properly when multiple
applications are trying to launch a job. If we launch application A, and
soon after application B, application B waits pretty much till the
completion of application A for tasks to even be staged in Mesos. Right now
these applications are the spark-shell or the zeppelin interpreter.

Even a simple sc.parallelize(1 to 1000).reduce(+) launched in two
different spark-shells results in the issue we're observing. One of the
counts waits (in fact we don't even see the tasks being staged in mesos)
until the current one finishes. This is the biggest issue we have been
experience and any help or advice would be greatly appreciated. We want to
be able to launch multiple jobs concurrently on our cluster and share
resources appropriately.

Another issue we see is that the java heap-space on the mesos executor
backend process is not being cleaned up once a job has finished in the
spark shell.
I've attached a png file of the jvisualvm output showing that the heapspace
is still allocated on a worker node. If I force the GC from jvisualvm then
nearly all of that memory gets cleaned up. This may be because the
spark-shell is still active - but if we've waited long enough why doesn't
GC just clean up the space? However, even after forcing GC the mesos UI
shows us that these resources are still being used.
There should be a way to bring down the memory utilization of the executors
once a task is finished. It shouldn't continue to have that memory
allocated, even if a spark-shell is active on the driver.

We have mesos configured to use fine-grained mode.
The following are parameters we have set in our spark-defaults.conf file.


spark.eventLog.enabled   true
spark.eventLog.dir   hdfs://frontend-system:8090/directory

spark.local.dir/data/cluster-local/SPARK_TMP

spark.executor.memory50g

spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
spark.executor.uri  hdfs://frontend-system
:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz

spark.mesos.coarse  false

Please let me know if there are any questions about our configuration.
Any advice or experience the mesos community can share pertaining to issues
with fine-grained mode would be greatly appreciated!

I would also like to sincerely apologize for my previous test message on
the mailing list.
It was an ill-conceived idea since we are in a bit of a time crunch and I
needed to get this message posted. I forgot I needed to send reply on to
the user-subscribers email for me to be listed, resulting in message not
sent emails. I will not do that again.

Thanks,

Rahul Palamuttam


Re: test

2016-07-13 Thread daemeon reiydelle
Why are you wasting our time with this? Lame.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Jul 13, 2016 at 11:56 AM, Rahul Palamuttam 
wrote:

>
>


test

2016-07-13 Thread Rahul Palamuttam



Re: mesos/dcos user issue?

2016-07-13 Thread Joseph Wu
Looks like you solved your problem:

> either remove the "USER" statement or add the user locally on the mesos
agent machines

You can't run as a user that doesn't exist :)

On Wed, Jul 13, 2016 at 7:18 AM, Clarke, Trevor  wrote:

> I've got an image with a local user and a 'USER myuser' statement in the
> Dockerfile. When I try and run a container in mesos (we're using DC/OS but
> I think it's mesos related as we're not calling via marathon, etc. it's
> from a custom framework) I need "Failed to get user information for
> 'myuser'" unless I either remove the "USER" statement or add the user
> locally on the mesos agent machines. I still see a similar issue if I use
> "USER 4567" with a UID instead of username. Any idea what might be causing
> this?
>
> --
> Trevor R.H. Clarke
> Software Engineer, Ball Aerospace
> (937)320-7087
>
>
>
> This message and any enclosures are intended only for the addressee.
> Please
> notify the sender by email if you are not the intended recipient.  If you
> are
> not the intended recipient, you may not use, copy, disclose, or distribute
> this
> message or its contents or enclosures to any other person and any such
> actions
> may be unlawful.  Ball reserves the right to monitor and review all
> messages
> and enclosures sent to or from this email address.
>


mesos/dcos user issue?

2016-07-13 Thread Clarke, Trevor
I've got an image with a local user and a 'USER myuser' statement in the 
Dockerfile. When I try and run a container in mesos (we're using DC/OS but I 
think it's mesos related as we're not calling via marathon, etc. it's from a 
custom framework) I need "Failed to get user information for 'myuser'" unless I 
either remove the "USER" statement or add the user locally on the mesos agent 
machines. I still see a similar issue if I use "USER 4567" with a UID instead 
of username. Any idea what might be causing this?

--
Trevor R.H. Clarke
Software Engineer, Ball Aerospace
(937)320-7087



This message and any enclosures are intended only for the addressee.  Please 
notify the sender by email if you are not the intended recipient.  If you are 
not the intended recipient, you may not use, copy, disclose, or distribute this 
message or its contents or enclosures to any other person and any such actions 
may be unlawful.  Ball reserves the right to monitor and review all messages 
and enclosures sent to or from this email address.


OS X latency issue when run as a plist

2016-07-13 Thread Rinaldo Digiorgio
Hi,

There have been prior discussions on the list about the OS X Latency 
issue. I had filed a bug here:

https://issues.apache.org/jira/browse/MESOS-5589 


We have found that the root cause is starting the mesos application in 
the background using a plist entry.  If you launch the mesos agent from a 
terminal it works fine.  We have tried to get a plist (not an app) to work and 
none of the documented settings in launchd remove the latency issue.


https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man5/launchd.plist.5.html
 


The settings we tried are:

 ProcessType 
 This optional key describes, at a high level, the intended purpose of the 
job.  The system will apply
 resource limits based on what kind of job it is. If left unspecified, the 
system will apply light
 resource limits to the job, throttling its CPU usage and I/O bandwidth. 
The following are valid values:

   Background
   Background jobs are generally processes that do work that was not 
directly requested by the user.
   The resource limits applied to Background jobs are intended to 
prevent them from disrupting the
   user experience.

   Standard
   Standard jobs are equivalent to no ProcessType being set.

   Adaptive
   Adaptive jobs move between the Background and Interactive 
classifications based on activity over
   XPC connections. See xpc_transaction_begin(3) 

 for details.

   Interactive
   Interactive jobs run with the same resource limitations as apps, 
that is to say, none. Interac-tive Interactive
   tive jobs are critical to maintaining a responsive user experience, 
and this key should only be
   used if an app's ability to be responsive depends on it, and cannot 
be made Adaptive.


The mesos agent works correctly if you start it as a GUI app. This leaves an 
icon on the screen. One can live with it but it is an indication of the lack of 
proper documentation from apple and or utter lack of understanding of 
background application on the Desktop OS known as OS X.  If someone has a plist 
solution please share it. It is not reasonable to start mesos agents from a 
terminal session or cron, the operating system should manage startup and 
shutdown.

Rinaldo