Re: Executors no longer inherit environment variables from the agent

2016-03-10 Thread Connor Doyle
Rodrick, in your case those environment variables are set by the
framework as part of the TaskInfo, so those shouldn't be affected by
the change.

On Thu, Mar 10, 2016 at 10:38 AM, Rodrick Brown  wrote:
> This is unfortunate we are using environment variables that get passed into
> the executors context such as
>
> CHRONOS_RESOURCE_MEM
> MARATHON_APP_RESOURCE_MEM
>
> What will be the workaround?
>
> --
>
> Rodrick Brown / Systems Engineer
>
> +1 917 445 6839 / rodr...@orchardplatform.com
>
> Orchard Platform
>
> 101 5th Avenue, 4th Floor, New York, NY 10003
>
> http://www.orchardplatform.com
>
> Orchard Blog | Marketplace Lending Meetup
>>
>> On Mar 8 2016, at 2:33 pm, Gilbert Song  wrote:
>>
>> Hi,
>>
>>
>> TL;DR Executors will no longer inherit environment variables from the
>> agent by default in 0.30.
>>
>>
>> Currently, executors are inheriting environment variables form the agent
>> in mesos containerizer by default. This is an unfortunate legacy behavior
>> and is insecure. If you do have environment variables that you want to pass
>> to the executors, you can set it explicitly by using the
>> `--executor_environment_variables` agent flag.
>>
>>
>> Starting from 0.30, we will no longer allow executors to inherit
>> environment variables from the agent. In other words,
>> `--executor_environment_variables` will be set to “{}” by default. If you do
>> depend on the original behavior, please set
>> `--executor_environment_variables` flag explicitly.
>>
>>
>> Let us know if you have any comments or concerns.
>>
>>
>> Thanks,
>>
>> Gilbert
>
>
> NOTICE TO RECIPIENTS: This communication is confidential and intended for
> the use of the addressee only. If you are not an intended recipient of this
> communication, please delete it immediately and notify the sender by return
> email. Unauthorized reading, dissemination, distribution or copying of this
> communication is prohibited. This communication does not constitute an offer
> to sell or a solicitation of an indication of interest to purchase any loan,
> security or any other financial product or instrument, nor is it an offer to
> sell or a solicitation of an indication of interest to purchase any products
> or services to any persons who are prohibited from receiving such
> information under applicable law. The contents of this communication may not
> be accurate or complete and are subject to change without notice. As such,
> Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard")
> makes no representation regarding the accuracy or completeness of the
> information contained herein. The intended recipient is advised to consult
> its own professional advisors, including those specializing in legal, tax
> and accounting matters. Orchard does not provide legal, tax or accounting
> advice.



-- 
connor


Re: "Chaos monkey" for mesos?

2016-02-25 Thread Connor Doyle
There's no way to kill a single task through the Mesos control
surfaces, but if you let the "chaos" framework launch tasks as a
privileged user, you can run wild.

On Thu, Feb 25, 2016 at 2:49 PM, Srikanth Viswanathan
 wrote:
> Sorry, ignore my first question. A framework can obviously kill tasks. I was
> just unsure as to whether it can kill foreign tasks, which leaves only my
> second question.
>
> On Thu, Feb 25, 2016 at 5:23 PM, Srikanth Viswanathan 
> wrote:
>>
>> Appreciate all the responses here. I'll look into `mesos-execute`.
>>
>> I was thinking about the framework idea in passing but my mesos knowledge
>> isn't up to scratch yet, so I haven't been able pursue it yet. There are
>> many questions in my mind w.r.t designing this as a framework:
>> * Doesn't a framework only receive offers from mesos and launch tasks? How
>> would a framework kill tasks? Can it also kill slaves?
>> * Is it legal in mesos for one framework to kill tasks belonging to
>> another framework?
>>
>> Thanks.
>> Srikanth
>>
>> On Thu, Feb 25, 2016 at 4:58 PM, Connor Doyle 
>> wrote:
>>>
>>> I think you could approximate that tool's behavior with some scripting
>>> plus `mesos-execute` (ships with the distribution) or by writing a
>>> really simple framework that just turns things off.
>>>
>>> On Thu, Feb 25, 2016 at 1:14 PM, Srikanth Viswanathan
>>>  wrote:
>>> > Thanks. Craig and David. I'm curious about the design and use of that
>>> > tool.
>>> > Based on the video, it looks close to what I hope to do.
>>> >
>>> > A web search didn't yield any results about it, however. Does anyone
>>> > here
>>> > know more about the dcos chaos tool?
>>> >
>>> > Thanks again.
>>> > Srikanth
>>> >
>>> > On Thu, Feb 25, 2016 at 12:21 PM, craig w  wrote:
>>> >>
>>> >> here's a direct link in the video
>>> >> https://youtu.be/0I6qG9RQUnY?t=389
>>> >>
>>> >> On Thu, Feb 25, 2016 at 12:17 PM, David Wood 
>>> >> wrote:
>>> >>>
>>> >>> The DCOS tutorial mentions a chaos tool at the end of the video.  Not
>>> >>> sure if that's what your looking for, but it might be something to
>>> >>> follow up
>>> >>> on somehow.
>>> >>>
>>> >>> https://mesosphere.com/learn/
>>> >>>
>>> >>> David Wood
>>> >>> Computing Systems for Wireless Networks
>>> >>> IBM TJ Watson Research Center
>>> >>> daw...@us.ibm.com
>>> >>> 914-945-4923 (office), 914-396-6515 (mobile)
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> From:Srikanth Viswanathan 
>>> >>> To:user@mesos.apache.org
>>> >>> Date:02/25/2016 12:01 PM
>>> >>> Subject:"Chaos monkey" for mesos?
>>> >>> 
>>> >>>
>>> >>>
>>> >>>
>>> >>> Has there been any work done to develop a "chaos monkey" analogue for
>>> >>> Mesos? I have been researching on how to write one, but I wanted to
>>> >>> know if
>>> >>> there's any work already available that I can take a look at for
>>> >>> comparison,
>>> >>> and possibly re-use.
>>> >>>
>>> >>> The end goal would be something loaded into Mesos or separate from
>>> >>> Mesos
>>> >>> that randomly kills tasks. Could it be something as simple as an
>>> >>> application
>>> >>> that uses the KILL HTTP request from the scheduler API to kill tasks?
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>> Srikanth
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> https://github.com/mindscratch
>>> >> https://www.google.com/+CraigWickesser
>>> >> https://twitter.com/mind_scratch
>>> >> https://twitter.com/craig_links
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> connor
>>
>>
>



-- 
connor


Re: "Chaos monkey" for mesos?

2016-02-25 Thread Connor Doyle
I think you could approximate that tool's behavior with some scripting
plus `mesos-execute` (ships with the distribution) or by writing a
really simple framework that just turns things off.

On Thu, Feb 25, 2016 at 1:14 PM, Srikanth Viswanathan
 wrote:
> Thanks. Craig and David. I'm curious about the design and use of that tool.
> Based on the video, it looks close to what I hope to do.
>
> A web search didn't yield any results about it, however. Does anyone here
> know more about the dcos chaos tool?
>
> Thanks again.
> Srikanth
>
> On Thu, Feb 25, 2016 at 12:21 PM, craig w  wrote:
>>
>> here's a direct link in the video
>> https://youtu.be/0I6qG9RQUnY?t=389
>>
>> On Thu, Feb 25, 2016 at 12:17 PM, David Wood  wrote:
>>>
>>> The DCOS tutorial mentions a chaos tool at the end of the video.  Not
>>> sure if that's what your looking for, but it might be something to follow up
>>> on somehow.
>>>
>>> https://mesosphere.com/learn/
>>>
>>> David Wood
>>> Computing Systems for Wireless Networks
>>> IBM TJ Watson Research Center
>>> daw...@us.ibm.com
>>> 914-945-4923 (office), 914-396-6515 (mobile)
>>>
>>>
>>>
>>>
>>> From:Srikanth Viswanathan 
>>> To:user@mesos.apache.org
>>> Date:02/25/2016 12:01 PM
>>> Subject:"Chaos monkey" for mesos?
>>> 
>>>
>>>
>>>
>>> Has there been any work done to develop a "chaos monkey" analogue for
>>> Mesos? I have been researching on how to write one, but I wanted to know if
>>> there's any work already available that I can take a look at for comparison,
>>> and possibly re-use.
>>>
>>> The end goal would be something loaded into Mesos or separate from Mesos
>>> that randomly kills tasks. Could it be something as simple as an application
>>> that uses the KILL HTTP request from the scheduler API to kill tasks?
>>>
>>> Thanks.
>>>
>>> Srikanth
>>>
>>
>>
>>
>> --
>>
>> https://github.com/mindscratch
>> https://www.google.com/+CraigWickesser
>> https://twitter.com/mind_scratch
>> https://twitter.com/craig_links
>
>



-- 
connor


Re: Recommended way to discover current master

2015-08-31 Thread Connor Doyle
It's also worth noting the existence of the `mesos-resolve` binary, which can 
turn a canonical Mesos ZK string into the leading master location.
--
Connor


> On Aug 31, 2015, at 10:39, Marco Massenzio  wrote:
> 
> The easiest way is via accessing directly Zookeeper - as you don't need to 
> know a priori the list of Masters; if you do, however, hitting any one of 
> them will redirect (302) to the current Leader.
> 
> If you would like to see an example of how to retrieve that info from ZK, I 
> have written about it here[0].
> Finally, we're planning to make all this available via the Mesos Commons[1] 
> library (currently, there is a PR[2] waiting to be be merged).
> 
> 
> [0] 
> http://codetrips.com/2015/08/16/apache-mesos-leader-master-discovery-using-zookeeper-part-2/
> [1] https://github.com/mesos/commons
> [2] https://github.com/mesos/commons/pull/2/files
> 
> Marco Massenzio
> Distributed Systems Engineer
> http://codetrips.com
> 
> On Mon, Aug 31, 2015 at 10:25 AM, Philip Weaver  
> wrote:
> My framework knows the list of zookeeper hosts and the list of mesos master 
> hosts.
> 
> I can think of a few ways for the framework to figure out which host is the 
> current master. What would be the best? Should I check in zookeeper directly? 
> Does the mesos library expose an interface to discover the master from 
> zookeeper or otherwise? Should I just try each possible master until one 
> responds?
> 
> Apologies if this is already well documented, but I wasn't able to find it. 
> Thanks!
> 
> - Philip
> 
> 



Re: Custom executor

2015-07-29 Thread Connor Doyle
In Marathon the executor ID is unique every time, so tasks and executors will 
be in 1:1 correspondence. More generally, if you re-use the executor info 
message when launching tasks, a single executor can handle multiple tasks 
simultaneously.

> On Jul 29, 2015, at 09:43, Aaron Carey  wrote:
> 
> ah cool! Will that run as one instance per task, or one scheduler per slave?
> 
> 
> From: Connor Doyle [connor@gmail.com]
> Sent: 29 July 2015 17:24
> To: user@mesos.apache.org
> Subject: Re: Custom executor
> 
> You don't even have to pre-load the executor on the slave boxes -- just add 
> it as a URL and it will be downloaded to the sandbox like any other resource!
> 
> On Jul 29, 2015, at 02:47, Aaron Carey  wrote:
> 
>> Ah I see.. so is it simply a case of making the executor file executable, 
>> putting it on the slave, and supplying the path to it in the JSON?
>> 
>> Thanks!
>> 
>> Aaron
>> 
>> From: Ondrej Smola [ondrej.sm...@gmail.com]
>> Sent: 29 July 2015 10:13
>> To: user@mesos.apache.org
>> Subject: Re: Custom executor
>> 
>> Hi Aaron,
>> 
>> custom executor should be supported by Marathon - i dont use it but from 
>> tests in 
>> 
>> https://github.com/mesosphere/marathon/blob/master/src/test/scala/mesosphere/mesos/TaskBuilderTest.scala#L236
>> 
>> there is a option to specify path to custom executor. 
>> 
>> https://mesosphere.github.io/marathon/docs/rest-api.html#post-/v2/apps
>> 
>> in task definition there is "executor" json prop
>> 
>> Chronos also supports this property
>> 
>> 
>> Download/create some simple executor and try to test it.
>> 
>> 
>> 
>> 
>> 2015-07-29 11:00 GMT+02:00 Aaron Carey :
>>> Hi Tim,
>>> 
>>> We have some specific requirements for moving data around when executing 
>>> tasks on slaves, I want to be able to 'check out' a selection of files, and 
>>> possibly mount filesystems onto the slave (and subsequently into the 
>>> executing docker container). The data required by each task is specified in 
>>> our database.
>>> 
>>> Basically I wanted to customise an executor to prepare the data on the 
>>> slave before executing the docker container, rather than having to get the 
>>> container to download its own data or attempt to mount NFS volumes itself.
>>> 
>>> I hope that all makes sense, I couldn't find a simple solution to this 
>>> using the existing architecture.. I'd love to know your thoughts though!
>>> 
>>> Thanks,
>>> Aaron
>>> 
>>> From: Tim Chen [t...@mesosphere.io]
>>> Sent: 28 July 2015 19:01
>>> To: user@mesos.apache.org
>>> Subject: Re: Custom executor
>>> 
>>> Can you explain what your motivations are and what your new custom executor 
>>> will do?
>>> 
>>> Tim
>>> 
>>>> On Tue, Jul 28, 2015 at 5:08 AM, Aaron Carey  wrote:
>>>> Hi,
>>>> 
>>>> Is it possible to build a custom executor which is not associated with a 
>>>> particular scheduler framework? I want to be able to write a custom 
>>>> executor which is available to multiple schedulers (eg Marathon, Chronos 
>>>> and our own custom scheduler). Is this possible? I couldn't quite figure 
>>>> out the best way to go about this from the docs? Is it possible to mix and 
>>>> match languages for schedulers and executors? (ie one is python one is C++)
>>>> 
>>>> Thanks,
>>>> Aaron


Re: Custom executor

2015-07-29 Thread Connor Doyle
You don't even have to pre-load the executor on the slave boxes -- just add it 
as a URL and it will be downloaded to the sandbox like any other resource!

> On Jul 29, 2015, at 02:47, Aaron Carey  wrote:
> 
> Ah I see.. so is it simply a case of making the executor file executable, 
> putting it on the slave, and supplying the path to it in the JSON?
> 
> Thanks!
> 
> Aaron
> 
> From: Ondrej Smola [ondrej.sm...@gmail.com]
> Sent: 29 July 2015 10:13
> To: user@mesos.apache.org
> Subject: Re: Custom executor
> 
> Hi Aaron,
> 
> custom executor should be supported by Marathon - i dont use it but from 
> tests in 
> 
> https://github.com/mesosphere/marathon/blob/master/src/test/scala/mesosphere/mesos/TaskBuilderTest.scala#L236
> 
> there is a option to specify path to custom executor. 
> 
> https://mesosphere.github.io/marathon/docs/rest-api.html#post-/v2/apps
> 
> in task definition there is "executor" json prop
> 
> Chronos also supports this property
> 
> 
> Download/create some simple executor and try to test it.
> 
> 
> 
> 
> 2015-07-29 11:00 GMT+02:00 Aaron Carey :
>> Hi Tim,
>> 
>> We have some specific requirements for moving data around when executing 
>> tasks on slaves, I want to be able to 'check out' a selection of files, and 
>> possibly mount filesystems onto the slave (and subsequently into the 
>> executing docker container). The data required  by each task is specified in 
>> our database.
>> 
>> Basically I wanted to customise an executor to prepare the data on the slave 
>> before executing the docker container, rather than having to get the 
>> container to download its own data or attempt to mount NFS volumes itself.
>> 
>> I hope that all makes sense, I couldn't find a simple solution to this using 
>> the existing architecture.. I'd love to know your thoughts though!
>> 
>> Thanks,
>> Aaron
>> 
>> From: Tim Chen [t...@mesosphere.io]
>> Sent: 28 July 2015 19:01
>> To: user@mesos.apache.org
>> Subject: Re: Custom executor
>> 
>> Can you explain what your motivations are and what your new custom executor 
>> will do?
>> 
>> Tim
>> 
>>> On Tue, Jul 28, 2015 at 5:08 AM, Aaron Carey  wrote:
>>> Hi,
>>> 
>>> Is it possible to build a custom executor which is not associated with a 
>>> particular scheduler framework? I want to be able to write a custom 
>>> executor which is available to multiple schedulers (eg Marathon, Chronos 
>>> and our own custom scheduler). Is this possible? I couldn't quite figure 
>>> out the best way to go about this from the docs? Is it possible to mix and 
>>> match languages for schedulers and executors? (ie one is python one is C++)
>>> 
>>> Thanks,
>>> Aaron
> 


Re: Can Framework accept partial offers

2015-07-06 Thread Connor Doyle
Hi Ying,

When launching tasks, the scheduler includes the resources to consume.  The 
remainder is implicitly declined.
Also, the scheduler can accept and merge multiple offers from the same slave.

--
Connor


> On Jul 6, 2015, at 16:19, Ying Ji  wrote:
> 
> Hey, mesos experts:
> 
> I have a question about mesos resource allocation. If the framework sends 
> the resource request, the master will give the current best offer to the 
> framework (probably not the one which can satisfy the framework completely). 
> In this case, the framework can either accept the offer or decline the offer. 
> My question is: can the framework accept the partial offer, and decline the 
> other part ? 
> 
> 
> Thanks
> 
> Ying



Re: [DISCUSS] Renaming Mesos Slave

2015-06-02 Thread Connor Doyle
James, I'll just say one thing:

The proposed change is for the benefit of those who _do_ have a problem with 
the current name.

Of course you are free from having to empathize, but why block the change if 
there is support?
Finding out if there is wider support is the purpose of this thread.

--
Connor


> On Jun 2, 2015, at 11:43, CCAAT  wrote:
> 
> On 06/02/2015 11:58 AM, craig mcmillan wrote:
>> not being from a slavery oppressed minority i'm not in a position to
>> offer an opinion on the experience of the use of 'slave' in CS
>> terminology, and the definition of 'minion' doesn't seem overly more
>> empowering
>> 
>> however :
>> 
>> dom / sub
>> 
>> is more fun and a little bit cheeky
>> 
>> :c
> 
> 
> Ah. The nightclub scene is more salient; so Berlin is your favourite city?  
> What if the roles reverse; how does that map to mesos, clustering or parallel 
> efforts? For humorous reasons, I like
> Mommy --> daddy so as to promote females to participate in mesos?
> 
> 
> 
> I say all of this, as my grandfather, who later on in life became
> a pharmacist and drug store owner, was a slave in his youth. I find it none 
> offense. The only thing I find offense is those not willing to fight to 
> overcome their circumstances. As an over educated person, I find the entire 
> historical education experienced much more offending than something that has 
> existed in every culture that is more than a few hundred years old. For me, 
> obtaining education and then social status, from elites, is an ugly process. 
> Now, here in the USA, we
> have graduates in debt up to there eyes and often no jobs. You want to 
> address a social-ill, why not just get rid of "tenure" and put the pedantics 
> on the same hire-fire master-slave relationship graduates are under?  The 
> past is just that; the past, learn from it and move on. Take
> actions about TODAY and tomorrow. Stop wallowing in the self pitty of what 
> other did hundreds or thousands of years ago!
> 
> 
> WE still have wage-slaves,  sex-slaves and many forms of human traffic that 
> are or are very, very close to slavery. Try to show your independence, as 
> part of a military collective; commander-slave.
> 
> How about elite-slave?  politician-slave?  Ivy_league--community_college
> for names?
> 
> 
> As a solution, why don't we make these relationships 'user defined 
> variables'?  Surely that would be great fun and prepare us for supporting 
> languages such as Haskell in a fun and ambitious function
> sort of way? [1]
> 
> 
> 
> James
> 
> [1] http://lesswrong.com/lw/k1o/botworld_a_cellular_automaton_for_studying/
> 



Re: [DISCUSS] Renaming Mesos Slave

2015-06-01 Thread Connor Doyle
+1

1. Mesos Worker [node/host/machine]
2. Mesos Worker [process]
3. No, master/worker seems to address the issue with less changes.
4. Begin using the new name ASAP, add a disambiguation to the docs, and change 
old references over time.  Fixing the "official" name, even before changes are 
in place, would be a good first step.

--
Connor


> On Jun 1, 2015, at 14:18, Adam Bordelon  wrote:
> 
> There has been much discussion about finding a less offensive name than 
> "Slave", and many of these thoughts have been captured in 
> https://issues.apache.org/jira/browse/MESOS-1478
> 
> I would like to open up the discussion on this topic for one week, and if we 
> cannot arrive at a lazy consensus, I will draft a proposal from the 
> discussion and call for a VOTE.
> Here are the questions I would like us to answer:
> 1. What should we call the "Mesos Slave" node/host/machine?
> 2. What should we call the "mesos-slave" process (could be the same)?
> 3. Do we need to rename Mesos Master too?
> 
> Another topic worth discussing is the deprecation process, but we don't 
> necessarily need to decide on that at the same time as deciding the new 
> name(s).
> 4. How will we phase in the new name and phase out the old name?
> 
> Please voice your thoughts and opinions below.
> 
> Thanks!
> -Adam-
> 
> P.S. My personal thoughts:
> 1. Mesos Worker [Node]
> 2. Mesos Worker or Agent
> 3. No
> 4. Carefully



Re: Mess cluster resources utilization

2015-05-07 Thread Connor Doyle
There is also a /v2/queue endpoint in Marathon to inspect tasks that have been 
released, but not yet scheduled.
--
Connor

> On May 7, 2015, at 01:07, Adam Bordelon  wrote:
> 
> Yaron, I meant by comparing the available info. You could query Marathon's 
> /v2/apps endpoint to get the list of pending tasks and the resources 
> requested for each of them, and you could check the Mesos master and slave 
> /statistics.json to see the total amount of unallocated resources to estimate 
> how many additional resources you need for how many instances (may need 
> unique hosts) of pending tasks. Then you would have to map this onto a 
> request in a (cloud) provisioning tool for X more nodes with Y resources each.
> 
> Alternatively, you could use this same information, along with some notion of 
> relative priority to kill off (and scale down) lower priority tasks until you 
> have enough resources to satisfy your higher priority tasks.
> 
>> On Mon, May 4, 2015 at 10:32 AM, Tim Chen  wrote:
>> Hi Yaron,
>> 
>> Marathon itself has its own REST endpoint you can hit (/v2/apps) that will 
>> return to you all the apps and tasks information, so you can see how many of 
>> the apps are launched and how many are still pending.
>> 
>> Tim
>> 
>>> On Mon, May 4, 2015 at 5:28 AM, Yaron Rosenbaum  
>>> wrote:
>>> Hi Adam,
>>> 
>>> For example, with Marathon - how can I get the list of pending tasks ? and 
>>> by  “how many additional nodes you would need to satisfy them” - do you 
>>> mean, by comparing the two? or is there statistics for that too?
>>> 
>>> Thanks
>>> 
>>> (Y)
>>> 
 On May 3, 2015, at 10:10 AM, Adam Bordelon  wrote:
 
 Yaron,
 
 You could use the /statistics.json endpoints to monitor the cpu/memory 
 allocation across your cluster, even on individual nodes.
 Only individual frameworks know their own pending tasks and how many 
 additional resources you would need to satisfy them.
 Given these pieces of information, you should be able to trigger your own 
 auto-provisioning mechanism.
 
> On Fri, May 1, 2015 at 11:18 AM, Yaron Rosenbaum 
>  wrote:
> Hi
> 
> Is there a way in mesos / marathon to know that tasks cannot be assigned 
> due to lack of resources? or in other words - when to add mesos-slaves to 
> the cluster?
> 
> Or even more specifically, what amount of resources are missing (or in 
> excess) given the current tasks and slaves?
> 
> Thanks
> (Y)
> 


Re: CPU resource allocation: ignore?

2015-03-11 Thread Connor Doyle
If you don't care at all about accounting usage of that resource then you 
should be able to set it to 0.0.  As Ian mentioned, this won't be enforced with 
the cpu isolator disabled.
--
Connor

> On Mar 11, 2015, at 08:43, Ian Downes  wrote:
> 
> The --isolation flag for the slave determines how resources are *isolated*, 
> i.e., by not specifying any cpu isolator there will be no isolation between 
> executors for cpu usage; the Linux scheduler will try to balance their 
> execution.
> 
> Cpu and memory are considered required resources for executors and I believe 
> the master enforces this.
> 
> What are behavior are you trying to achieve? If your jobs don't require much 
> cpu then can you not just set a small value, like 0.25 cpu?
> 
>> On Wed, Mar 11, 2015 at 7:20 AM, Geoffroy Jabouley 
>>  wrote:
>> Hello
>> 
>> As cpu relatives shares are *not very* relevant in our heterogenous cluster, 
>> we would like to get rid of CPU resources management and only use MEM 
>> resources for our cluster and tasks allocation.
>> 
>> Even when modifying the isolation flag of our slave to 
>> "--isolation=cgroups/mem", we see these in the logs:
>> 
>> from the slave, at startup:
>> I0311 15:09:55.006750 50906 slave.cpp:289] Slave resources: 
>> ports(*):[31000-32000, 80-443]; cpus(*):2; mem(*):1979; disk(*):22974
>> 
>> from the master:
>> I0311 15:15:16.764714 50884 hierarchical_allocator_process.hpp:563] 
>> Recovered ports(*):[31000-32000, 80-443]; cpus(*):2; mem(*):1979; 
>> disk(*):22974 (total allocatable: ports(*):[31000-32000, 80-443]; cpus(*):2; 
>> mem(*):1979; disk(*):22974) on slave 
>> 20150311-150951-3982541578-5050-50860-S0 from framework 
>> 20150311-150951-3982541578-5050-50860-
>> 
>> And mesos master UI is showing both CPU and MEM resources status.
>> 
>> 
>> 
>> Btw, we are using Marathon and Jenkins frameworks to start our mesos tasks, 
>> and the "cpus" field seems mandatory (set to 1.0 by default). So i guess you 
>> cannot easily bypass cpu resources allocation...
>> 
>> 
>> Any idea?
>> Regards
>> 
>> 2015-02-19 15:15 GMT+01:00 Ryan Thomas :
>>> Hey Don,
>>> 
>>> Have you tried only setting the 'cgroups/mem' isolation flag on the slave 
>>> and not the cpu one? 
>>> 
>>> http://mesosphere.com/docs/reference/mesos-slave/
>>> 
>>> 
>>> ryan
>>> 
 On 19 February 2015 at 14:13, Donald Laidlaw  wrote:
 I am using Mesos 0.21.1 with Marathon 0.8.0 and running everything in 
 docker containers.
 
 Is there a way to have mesos ignore the cpu relative shares? That is, not 
 limit the docker container CPU at all when it runs. I would still want to 
 have the Memory resource limitation, but would rather just let the linux 
 system under the containers schedule all the CPU.
 
 This would allow us to just allocate tasks to mesos slaves based on 
 available memory only, and to let those tasks get whatever CPU they could 
 when they needed it. This is desireable where there can be lots of 
 relative high memory tasks that have very low CPU requirements. Especially 
 if we do not know the capabilities of the slave machines with regards to 
 CPU. Some of them may have fast CPU's, some slow, so it is hard to pick a 
 relative number for that slave.
 
 Thanks,
 
 Don Laidlaw
> 


Re: Proposal: shared Mesos framework hosting and registry

2014-12-14 Thread Connor Doyle
quest nightmare and I'll probably search for a way to provide to my
>>> customers a restrictive access (WebUI) to the cluster manager and allow them
>>> to deploy those new add-on by themself.
>>>
>>> With this study case in mind, I'll be happy to have a registry which
>>> allow a filtering mechanism to hide or show frameworks/modules/etc by status
>>> (Official Apache Content / Public Content / Private Content).
>>>
>>> Here is my thoughts about the registry articulation:
>>>
>>> Mesos Registry is an integrated part of Mesos master services.
>>> Mesos Registry is an endpoint available to WebUI and CLI.
>>> Mesos Registry is nothing but a metadata registry.
>>> Mesos Registry save its configs and metadata on a key/value store
>>> (zookeeper?).
>>> Mesos Registry is empty at the first launch.
>>> Mesos Registry as three views: Official Apache Content | Publicly
>>> Maintained Content | Private Maintained Content
>>> Mesos Registry views content is:
>>>
>>> Official Apache Content == Link to existing Apache add-ons hosted on
>>> Github/Git repository + metadata (Like those proposed by Dave and Connor).
>>> Publicly Maintained Content == Links to existing repositories (Github /
>>> Git / other) + metadata (Like those proposed by Dave and Connor).
>>> Private Content == Links to existing private (GIT/GITHUB/Other
>>> repositories) + metadata (Like those proposed by Dave and Connor), this
>>> repository is kinda special as it is hosted and created by the cluster
>>> operators and could be a mixed content of locally maintained repository (GIT
>>> repos on a HDFS or TAR on HDFS) and public content repository cloned from
>>> the public content URL/Metadata.
>>>
>>> I don't know if I'm really clear, so if I'm not, let me know it, I'll do
>>> some sketches :D
>>>
>>>
>>>
>>> 2014-12-02 1:53 GMT+01:00 Connor Doyle :
>>>>
>>>> Hi Dave,
>>>>
>>>> This is a timely topic, since we have been prototyping and mocking up
>>>> something similar at Mesosphere.  We created a new public GitHub repository
>>>> for it about three weeks ago called "universe"
>>>> (http://github.com/mesosphere/universe).
>>>>
>>>> Although we have added some informal specs, it's very malleable at this
>>>> point.  We're very much interested in making our "universe" compatible 
>>>> with,
>>>> or the same as, the registry you're proposing.  Without delving into
>>>> implementation details, some of the goals we have in mind are outlined
>>>> below.
>>>>
>>>> Data Source:
>>>>
>>>> The package repository should be easily consumable by third-party
>>>> command-line and other programs.  There should be a condensed “index”
>>>> representation of the package repository available.
>>>>
>>>> Packages within the repository should be versioned.
>>>>
>>>> The package repository format itself should be versioned.
>>>>
>>>> Decentralization and Composability:
>>>>
>>>> The package metadata should be hosted in a public place (we like GitHub)
>>>> so that additional packages can be added by the community by simply
>>>> submitting pull requests.  We have added some rudimentary commit hooks and
>>>> automated validation to protect the repo against breaking changes.
>>>>
>>>> It’s important that no single entity “owns the keys” to the universe,
>>>> and that the spec and implementation remain public.  It should be easy and
>>>> free for organizations to maintain a private package repository.
>>>>
>>>> A corollary is that it should be easy for consumers to pull from a
>>>> hierarchy of upstream repositories.  One setup we have in mind is that an
>>>> organization might have staging and production repositories running
>>>> internally.  Packages are pushed to staging where integration testing can
>>>> run before “deployment” to production.  If a package isn’t in the local
>>>> repository it might be looked up and installed from upstream.
>>>>
>>>> 
>>>>
>>>>
>>>> Repositories should be able to be proxied and cached in this way.
>>>> Organizations should be able to isolate their datacenter but also
>>>> selectively

Re: Question about External Containerizer

2014-12-03 Thread Connor Doyle
You're right Sharma, it's dependent upon the framework.  If your scheduler sets 
a unique ExecutorID for each TaskInfo, then the executor will not be re-used 
and you won't have to worry about resizing the executor's container to 
accomodate subsequent tasks.  This might be a reasonable simplification to 
start with, especially if your executor adds relatively low resource overhead.
--
Connor


> On Dec 3, 2014, at 10:20, Sharma Podila  wrote:
> 
> This may have to do with fine-grain Vs coarse-grain resource allocation. 
> Things may be easier for you, Diptanu, if you are using one Docker container 
> per task (sort of coarse grain). In that case, I believe there's no need to 
> alter a running Docker container's resources. Instead, the resource update of 
> your executor translates into the right Docker containers running. There's 
> some details to be worked out there, I am sure. 
> It sounds like Tom's strategy uses the same Docker container for multiple 
> tasks. Tom, do correct me otherwise.
> 
> On Wed, Dec 3, 2014 at 3:38 AM, Tom Arnfeld  wrote:
> When Mesos is asked to a launch a task (with either a custom Executor or the 
> built in CommandExecutor) it will first spawn the executor which _has_ to be 
> a system process, launched via command. This process will be launched inside 
> of a Docker container when using the previously mentioned containerizers.
> 
> Once the Executor registers with the slave, the slave will send it a number 
> of launchTask calls based on the number of tasks queued up for that executor. 
> The Executor can then do as it pleases with those tasks, whether it's just a 
> sleep(1) or to spawn a subprocess and do some other work. Given it is 
> possible for the framework to specify resources for both tasks and executors, 
> and the only thing which _has_ to be a system process is the executor, the 
> mesos slave will limit the resources of the executor process to the sum of 
> (TaskInfo.Executor.Resources + TaskInfo.Resources). 
> 
> Mesos also has the ability to launch new tasks on an already running 
> executor, so it's important that mesos is able to dynamically scale the 
> resource limits up and down over time. Designing a framework around this idea 
> can lead to some complex and powerful workflows which would be a lot more 
> complex to build without Mesos.
> 
> Just for an example... Spark.
> 
> 1) User launches a job on spark to map over some data
> 2) Spark launches a first wave of tasks based on the offers it received 
> (let's say T1 and T2)
> 3) Mesos launches executors for those tasks (let's say E1 and E2) on 
> different slaves
> 4) Spark launches another wave of tasks based on offers, and tells mesos to 
> use the same executor (E1 and E2)
> 5) Mesos will simply call launchTasks(T{3,4}) on the two already running 
> executors
> 
> At point (3) mesos is going to launch a Docker container and execute your 
> executor. However at (5) the executor is already running so the tasks will be 
> handed to the already running executor. 
> 
> Mesos will guarantee you (i'm 99% sure) that the resources for your container 
> have been updated to reflect the limits set on the tasks before handing the 
> tasks to you.
> 
> I hope that makes some sense!
> 
> --
> 
> Tom Arnfeld
> Developer // DueDil
> 
> 
> On Wed, Dec 3, 2014 at 10:54 AM, Diptanu Choudhury  wrote:
> 
> Thanks for the explanation Tom, yeah I just figured that out by reading your 
> code! You're touching the memory.soft_limit_in_bytes and 
> memory.limit_in_bytes directly.
> 
> Still curios to understand in which situations Mesos Slave would call the 
> external containerizer to update the resource limits of a container? My 
> understanding was that once resource allocation happens for a task, resources 
> are not taken away until the task exits[fails, crashes or finishes] or Mesos 
> asks the slave to kill the task. 
> 
> On Wed, Dec 3, 2014 at 2:47 AM, Tom Arnfeld  wrote:
> Hi Diptanu,
> 
> That's correct, the ECP has the responsibility of updating the resource for a 
> container, and it will do as new tasks are launched and killed for an 
> executor. Since docker doesn't support this, our containerizer (Deimos does 
> the same) goes behind docker to the cgroup for the container and updates the 
> resources in a very similar way to the mesos-slave. I believe this is also 
> what the built in Docker containerizer will do.
> 
> https://github.com/duedil-ltd/mesos-docker-containerizer/blob/master/containerizer/commands/update.py#L35
> 
> Tom.
> 
> --
> 
> Tom Arnfeld
> Developer // DueDil
> 
> 
> On Wed, Dec 3, 2014 at 10:45 AM, Diptanu Choudhury  wrote:
> 
> Hi,
> 
> I had a quick question about the external containerizer. I see that once the 
> Task is launched, the ECP can receive the update calls, and the protobuf 
> message passed to ECP with the update call is containerizer::Update. 
> 
> This protobuf has a Resources [list] field so does that mean Mesos might ask 
> a running task to re-adjust the enforced 

Re: Proposal: shared Mesos framework hosting and registry

2014-12-01 Thread Connor Doyle
Hi Dave,

This is a timely topic, since we have been prototyping and mocking up something 
similar at Mesosphere.  We created a new public GitHub repository for it about 
three weeks ago called "universe" (http://github.com/mesosphere/universe).

Although we have added some informal specs, it's very malleable at this point.  
We're very much interested in making our "universe" compatible with, or the 
same as, the registry you're proposing.  Without delving into implementation 
details, some of the goals we have in mind are outlined below.

Data Source:

The package repository should be easily consumable by third-party command-line 
and other programs.  There should be a condensed “index” representation of the 
package repository available.

Packages within the repository should be versioned.

The package repository format itself should be versioned.

Decentralization and Composability:

The package metadata should be hosted in a public place (we like GitHub) so 
that additional packages can be added by the community by simply submitting 
pull requests.  We have added some rudimentary commit hooks and automated 
validation to protect the repo against breaking changes.

It’s important that no single entity “owns the keys” to the universe, and that 
the spec and implementation remain public.  It should be easy and free for 
organizations to maintain a private package repository.

A corollary is that it should be easy for consumers to pull from a hierarchy of 
upstream repositories.  One setup we have in mind is that an organization might 
have staging and production repositories running internally.  Packages are 
pushed to staging where integration testing can run before “deployment” to 
production.  If a package isn’t in the local repository it might be looked up 
and installed from upstream.



Repositories should be able to be proxied and cached in this way.  
Organizations should be able to isolate their datacenter but also selectively 
add external packages for experimentation. The system should be sufficiently 
portable and extensible to accomodate these and similar use cases.

Meta-Framework Descriptors:

Our conception of the package repository is a bit more expansive than just 
Mesos frameworks; it includes descriptions of how to install any piece of 
server software on a Mesos cluster.  Frameworks and non-frameworks alike may be 
installed using some other meta-framework that’s responsible for starting all 
other cluster services.  Likely candidates for this role are the long-lived 
frameworks: Aurora, Marathon, Singularity, and eventually Kubernetes.  In any 
case, the repository spec should not be prescriptive with respect to this 
choice.

The package repository metadata should make it easy for Mesos framework authors 
(and authors of non-Mesos-aware programs) to describe how to install their 
software on a Mesos cluster.  To this end, our prototype package spec allows 
for Meta-framework descriptor files for each package in the repository.  For 
example for a given package we might see a `marathon.json` file as well as a 
`my-app.aurora` file.

An obvious concern is how to specify site-specific arguments upon installation. 
 Here packages should describe data that must be marshalled from the 
environment (e.g. by prompting a user) and combined with the raw meta-framework 
descriptor to launch the app.  These configuration parameters should be 
agnostic of the supported meta-frameworks.  More concretely, in our prototype 
we describe configuration data in terms of a JSON-Schema.

CLI Integration:

Part of our proposed package format is an optional descriptor for how to fetch 
and install the command-line tools for interacting with the application.  For 
now, we only have one implementation of this, which is to fetch a python egg 
from PyPI.

Governance:

All in all, we think that making this effort more community driven is a healthy 
way to proceed.  Any input is very welcome.  For example, if others think that 
what we have is a good starting point we could transfer ownership of the 
repository to the mesos organization on GitHub.

Cheers,
--
Connor Doyle
http://mesosphere.com




> On Nov 30, 2014, at 17:32, Dave Lester  wrote:
> 
> As the number of Mesos frameworks grows (and now, a module system), I think 
> it's time to create a community-maintained registry with the goal of making 
> frameworks and modules easier to discover, contribute to, and install.
> 
> There's already a JIRA ticket tracking this (MESOS-1759) and I've chatted 
> with several folks (thanks in particular Victor Vieux, Tom Arnfeld, Vinod 
> Kone, Timothy St Clair, and Joe Stein). I'd like to advance the conversation 
> by offering a proposal on the public mailing list.
> 
> I imagine two initiatives to achieve this:
> 
> 1) Shared hosting via a GitHub org. I'm not sure if you're familiar with how 
> Jenkins main

Re: Task Checkpointing with Mesos, Marathon and Docker containers

2014-11-25 Thread Connor Doyle
Hi Geoffroy,

For the Marathon instances, in all released version of Marathon you must supply 
the --checkpoint flag to turn on task checkpointing for the framework.  We've 
changed the default to true starting with the next release.

There is a bug in Mesos where the FrameworkInfo does not get updated when a 
framework re-registers.  This means that if you shut down Marathon and restart 
it with --checkpoint, the Mesos master (with the same FrameworkId, which 
Marathon picks up from ZK) will ignore the new setting.  For reference, here is 
the design doc to address that: 
https://cwiki.apache.org/confluence/display/MESOS/Design+doc%3A+Updating+Framework+Info

Fortunately, there is an easy workaround.

1) Shut down Marathon (tasks keep running)
2) Restart the leading Mesos master (tasks keep running)
3) Start Marathon with --checkpoint enabled

This works by clearing the Mesos master's in-memory state.  It is rebuilt as 
the slave nodes and frameworks re-register.

Please report back if this doesn't solve the issue for you.
--
Connor


> On Nov 25, 2014, at 07:43, Geoffroy Jabouley  
> wrote:
> 
> Hello
> 
> i am currently trying to activate checkpointing for my Mesos cloud.
> 
> Starting from an application running in a docker container on the cluster, 
> launched from marathon, my use cases are the followings:
> 
> UC1: kill the marathon service, then restart after 2 minutes.
> Expected: the mesos task is still active, the docker container is running. 
> When the marathon service restarts, it get backs its tasks.
> 
> Result: OK
> 
> 
> UC2: kill the mesos slave, then restart after 2 minutes.
> Expected: the mesos task remains active, the docker container is running. 
> When the mesos slave service restarts, it get backs its tasks. Marathon does 
> not show error.
> 
> Results: task get status LOST when slave is killed. Docker container still 
> running.  Marathon detects the application went down and spawn a new one on 
> another available mesos slave. When the slave restarts, it kills the previous 
> running container and start a new one. So i end up with 2 applications on my 
> cluster, one spawn by Marathon, and another orphan one.
> 
> 
> Is this behavior normal? Can you please explain what i am doing wrong?
> 
> ---
> 
> Here is the configuration i have come so far:
> Mesos 0.19.1 (not dockerized)
> Marathon 0.6.1 (not dockerized)
> Docker 1.3 + Deimos 0.4.2
> 
> Mesos master is started:
> /usr/local/sbin/mesos-master --zk=zk://...:2181/mesos --port=5050 
> --log_dir=/var/log/mesos --cluster=CLUSTER_POC --hostname=... --ip=... 
> --quorum=1 --work_dir=/var/lib/mesos
> 
> Mesos slave is started:
> /usr/local/sbin/mesos-slave --master=zk://...:2181/mesos 
> --log_dir=/var/log/mesos --checkpoint=true 
> --containerizer_path=/usr/local/bin/deimos 
> --executor_registration_timeout=5mins --hostname=... --ip=... 
> --isolation=external --recover=reconnect --recovery_timeout=120mins 
> --strict=true
> 
> Marathon is started:
> java -Xmx512m -Djava.library.path=/usr/local/lib 
> -Djava.util.logging.SimpleFormatter.format=%2$s %5$s%6$s%n -cp 
> /usr/local/bin/marathon mesosphere.marathon.Main --zk zk://...:2181/marathon 
> --master zk://...:2181/mesos --local_port_min 3 --hostname ... 
> --event_subscriber http_callback --http_port 8080 --task_launch_timeout 
> 30 --local_port_max 4 --ha --checkpoint
> 
> 
> 
> 



Re: args for Docker run surrounded by quotes

2014-10-29 Thread Connor Doyle
Andrew, could you explain what you changed to make this work?

Marathon doesn't expose a `shell` argument; it's set implicitly by using either 
`cmd` or `args` in the app JSON.  `args` is what you want (sets shell to false) 
if you are using a Dockerfile with an ENTRYPOINT clause.  `args` is an array, 
and it looks like Mesos is wrapping each argument in quotes.  Did you try 
passing the arguments as separate array elements?


>>> "args": ["--master zk://...:2181/mesos --zk_hosts zk:/...:2181"],

"args": ["--master", "zk://...:2181/mesos", "--zk_hosts", "zk:/...:2181"],

--
Connor


On Oct 29, 2014, at 9:44, Andrew Jones  wrote:

> Thanks a lot TIm. That worked perfectly.
> 
> Thanks,
> Andrew
> 
> On Wed, Oct 29, 2014, at 03:58 PM, Timothy Chen wrote:
>> Hi Andrew,
>> 
>> By default shell is enabled, which wraps your command in bin/sh and
>> single quotes.
>> 
>> Try passing shell false to marathon.
>> 
>> Tim
>> 
>> Sent from my iPhone
>> 
>>> On Oct 29, 2014, at 4:44 AM, Andrew Jones  
>>> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm trying to run a Docker image which has a defined entrypoint and pass
>>> args to it. It looks like when the args are passed to docker run, they
>>> are surrounded by single quotes.
>>> 
>>> The image I am trying to run is tomaskral/chronos, and this is the
>>> configuration I am giving to Marathon:
>>> 
>>> {
>>> "id": "chronos-test-2", 
>>> "container": {
>>>   "docker": {
>>> "image": "tomaskral/chronos",
>>> "network": "BRIDGE",
>>> "portMappings": [
>>>   {
>>> "containerPort": 8080,
>>> "hostPort": 0,
>>> "servicePort": 31000,
>>> "protocol": "tcp"
>>>   }
>>> ]
>>>   },
>>>   "type": "DOCKER",
>>>   "volumes": []
>>> },
>>> "ports":[31000],
>>> "args": ["--master zk://...:2181/mesos --zk_hosts zk:/...:2181"],
>>> "cpus": 0.2,
>>> "mem": 256.0,
>>> "instances": 1
>>> }
>>> 
>>> And this is an extract from the log from Mesos when the image is ran:
>>> 
>>> + logged chronos run_jar '--master zk://...:2181/mesos --zk_hosts
>>> zk://...:2181'
>>> 
>>> The argument has single quotes around it. run_jar is calling java, which
>>> cannot handle the quotes, and the process isn't starting.
>>> 
>>> If I run the image locally with docker run like this, it works:
>>> 
>>> docker run -p 8080:8080 tomaskral/chronos --master zk://...:2181/mesos
>>> --zk_hosts zk://...:2181
>>> 
>>> But adding quotes, like this, and I get the same output as I did from
>>> Mesos:
>>> 
>>> docker run -p 8080:8080 tomaskral/chronos '--master zk://...:2181/mesos
>>> --zk_hosts zk://...:2181'
>>> 
>>> So I think these quotes are being added by either Marathon or Mesos when
>>> calling docker run, which the java command inside the container can't
>>> handle.
>>> 
>>> Is it Mesos or Marathon adding the quotes? Is this something that should
>>> be fixed, or should the docker images expect this and cope?
>>> 
>>> This is Mesos 0.21.1 and Marathon 0.7.3. I have also asked the author of
>>> the image for help (https://github.com/kadel/Dockerfiles/issues/3).
>>> 
>>> Thanks,
>>> Andrew



Re: Docker odd behavior

2014-10-22 Thread Connor Doyle
Hi Eduardo,

There is a known defect in Mesos that matches your description:
https://issues.apache.org/jira/browse/MESOS-1915
https://issues.apache.org/jira/browse/MESOS-1884

A fix will be included in the next release.
https://reviews.apache.org/r/26486

You see the killTask because the default --task_launch_timeout value for 
Marathon is 60 seconds.
Created an issue to make the logging around this better:
https://github.com/mesosphere/marathon/issues/732

--
Connor


On Oct 22, 2014, at 16:18, Eduardo Jiménez  wrote:

> Hi,
> 
> I've started experimenting with mesos using the docker containerizer, and 
> running a simple example got into a very strange state.
> 
> I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux:
> 
> Linux  3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC 2014 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> Docker version 1.2.0, build fa7b24f/1.2.0
> 
> I start the mesos slave with these relevant options:
> 
> --cgroups_hierarchy=/cgroup
> --containerizers=docker,mesos
> --executor_registration_timeout=5mins
> --isolation=cgroups/cpu,cgroups/mem
> 
> I launched a very simple app, which is from the mesosphere examples:
> 
> {
>   "container": {
> "type": "DOCKER",
> "docker": {
>   "image": "libmesos/ubuntu"
> }
>   },
>   "id": "ubuntu-docker2",
>   "instances": "1",
>   "cpus": "0.5",
>   "mem": "512",
>   "uris": [],
>   "cmd": "while sleep 10; do date -u +%T; done"
> }
> 
> The app launches, but then mesos states the task is KILLED, yet the docker 
> container is STILL running. Here's the sequence of logs from that mesos-slave.
> 
> 1) Task gets created and assigned:
> 
> I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework 
> 20141017-172055-3489660938-5050-1603-
> I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework 
> 20141017-172055-3489660938-5050-1603-
> I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task 
> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' for executor 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> '20141017-172055-3489660938-5050-1603-
> I1022 17:44:13.989893 15195 docker.cpp:743] Starting container 
> 'c1fc27c8-13e9-484f-a30c-cb062ec4c978' for task 
> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor 
> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799') of framework 
> '20141017-172055-3489660938-5050-1603-'
> 
> So far so good. The log statements right next to "Starting container" is:
> 
> I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-
> I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update 
> TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603- from @0.0.0.0:0
> W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered executor 
> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' of framework 
> 20141017-172055-3489660938-5050-1603- because it has no tasks
> E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources for 
> container c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 running task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for 
> terminal task, destroying container: No container found
> 
> After this, there's several log messages like this:
> 
> I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received status 
> update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-
> I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding status 
> update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603- to master@10.0.0.147:5050
> W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending status 
> update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-
> I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding status 
> update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603- to master@10.0.0.147:5050
> 
> 
> Eventually the TASK_KILLED update is acked and the Mesos UI shows the task as 
> killed. By then, the process should be dead, but its not.
> 
> $ sudo docker ps
> CONTA

Re: Reconciliation Document

2014-10-16 Thread Connor Doyle
Thanks for writing this up Ben! I have a couple suggestions about additional 
details that could be helpful to explain.

First, could you go a little more in-depth about how this process works for 
terminated tasks? For example, how does reconciliation behave for tasks running 
on a slave that has become disconnected from the master? An overview of the 
various timeouts involved would also be really awesome.

Second, what happens when a framework attempts to reconcile a task that is 
completely unknown to Mesos? An example scenario could be that a task died, the 
terminal status update was ACKed, but the scheduler failed over before this 
information could be persisted. What task status (if any) does Mesos respond 
with?
--
Connor Doyle
http://mesosphere.io


On Oct 15, 2014, at 14:05, Benjamin Mahler  wrote:

> Hi all,
> 
> I've sent a review out for a document describing reconciliation, you can see 
> the draft here:
> https://gist.github.com/bmahler/18409fc4f052df43f403
> 
> Would love to gather high level feedback on it from framework developers. 
> Feel free to reply here, or on the review:
> https://reviews.apache.org/r/26669/
> 
> Thanks!
> Ben



Re: Mesos Docker design question

2014-10-15 Thread Connor Doyle
Andy, passing the "sidekick" container ID is one issue.  But aside
from that, if you have written a custom framework what's to stop you
from waiting for a resource offer that accommodates both containers
you want to schedule and then submitting two TaskInfos in the same
call to SchedulerDriver.launchTasks?
--
Connor

On Wed, Oct 15, 2014 at 5:36 PM, Tim Chen  wrote:
> Hi Andy,
>
> I've definitely been seeing similar use cases popping up, and you're right
> that nothing in Mesos right out of the box has any support for co-locating
> tasks for you.
>
> For your potential solution, I don't see why you will need the container
> name or ID? TaskStatus also has slaveId so you do know which slave you want
> to launch your second task on. You will need to keep a mapping yourself that
> for your given TaskId you can now launch your 2nd task locating in the same
> slave id once you have a offer from that slave.
>
> And yes your concern is correct about can not always gurantee you can either
> launch or when to launch your 2nd docker task.
>
> I believe we will be thinking about more how to launch a collection of tasks
> co-located together, and what that looks like in the near future. If you
> have more requirements and thoughts of how to do so please share them as
> well.
>
> Thanks!
>
> Tim
>
> On Tue, Oct 14, 2014 at 4:57 PM, Andy Grove 
> wrote:
>>
>> We've made good progress deploying our product with Mesos but feel like we
>> may need to move away from using the mesos docker executor and roll our own
>> but at the same time I am wondering if I am just looking at the problem in
>> the wrong way, not having that much experience with mesos.
>>
>> The issue is that as well as being able to launch a docker container on a
>> slave, we also then want to be able to get information about the container
>> once it starts (like its ID or IP address) and write that information to
>> zookeeper.
>>
>> Our current approach is:
>>
>> 1. Scheduler asks mesos to execute container (e.g. use mesos docker
>> support to issue the "docker run" command)
>> 2. Have some code inside the container that gets the containers IP address
>> on startup and writes it to zookeeper
>>
>> This works but the downside is each container/image must have this extra
>> step added.
>>
>> There is a potential way of doing this in mesos instead but there are some
>> pieces missing:
>>
>> 1. Scheduler asks mesos to execute container (e.g. use mesos docker
>> support to issue the "docker run" command)
>> 2. Scheduler receives statusUpdate() saying that the task is running (but
>> we don't know the container ID or container name)
>> 3. Scheduler requests that the same slave now runs another task (custom
>> code in our product) that will get the container details and register them
>> with ZK
>>
>> There is no way for the scheduler to know the container ID which means we
>> can't schedule the follow up task.
>>
>> Even if we could do this, my concern would then be that step 3 might fail
>> if the slave no longer has spare resource.
>>
>> I'd appreciate any feedback on best practices to achieve this.
>>
>> Thanks,
>>
>> Andy.
>>
>> --
>> Andy Grove
>> VP Engineering
>> CodeFutures Corporation
>>
>>
>



-- 
connor


Re: Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Connor Doyle
It doesn't appear to be related to the registration timeout; based on the logs 
the time between task launch and kill was only about 4.3 seconds.
--
Connor

> On Oct 2, 2014, at 14:24, Dick Davies  wrote:
> 
> One thing to check - have you upped
> 
> --executor_registration_timeout
> 
> from the default of 1min? a docker pull can easily take longer than that.
> 
>> On 2 October 2014 22:18, Michael Babineau  wrote:
>> I'm seeing an issue where tasks are being marked as killed but remain
>> running. The tasks all run via the native Docker containerizer and are
>> started from Marathon.
>> 
>> The net result is additional, orphaned Docker containers that must be
>> stopped/removed manually.
>> 
>> Versions:
>> - Mesos 0.20.1
>> - Marathon 0.7.1
>> - Docker 1.2.0
>> - Ubuntu 14.04
>> 
>> Environment:
>> - 3 ZK nodes, 3 Mesos Masters, and 3 Mesos Slaves (all separate instances)
>> on EC2
>> 
>> Here's the task in the Mesos UI:
>> 
>> (note that stderr continues to update with the latest container output)
>> 
>> Here's the still-running Docker container:
>> $ docker ps|grep 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f
>> 3d451b8213ea
>> docker.thefactory.com/ace-serialization:f7aa1d4f46f72d52f5a20ef7ae8680e4acf88bc0
>> "\"/bin/sh -c 'java26 minutes ago  Up 26 minutes   9990/tcp
>> mesos-1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f
>> 
>> Here are the Mesos logs associated with the task:
>> $ grep eda431d7-4a74-11e4-a320-56847afe9799 /var/log/mesos/mesos-slave.INFO
>> I1002 20:44:39.176024  1528 slave.cpp:1002] Got assigned task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework
>> 20140919-224934-1593967114-5050-1518-
>> I1002 20:44:39.176257  1528 slave.cpp:1112] Launching task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework
>> 20140919-224934-1593967114-5050-1518-
>> I1002 20:44:39.177287  1528 slave.cpp:1222] Queuing task
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' for executor
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> '20140919-224934-1593967114-5050-1518-
>> I1002 20:44:39.191769  1528 docker.cpp:743] Starting container
>> '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f' for task
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' (and executor
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799') of framework
>> '20140919-224934-1593967114-5050-1518-'
>> I1002 20:44:43.707033  1521 slave.cpp:1278] Asked to kill task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518-
>> I1002 20:44:43.707811  1521 slave.cpp:2088] Handling status update
>> TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518- from @0.0.0.0:0
>> W1002 20:44:43.708273  1521 slave.cpp:1354] Killing the unregistered
>> executor 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
>> 20140919-224934-1593967114-5050-1518- because it has no tasks
>> E1002 20:44:43.708375  1521 slave.cpp:2205] Failed to update resources for
>> container 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f of executor
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 running task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 on status update for
>> terminal task, destroying container: No container found
>> I1002 20:44:43.708524  1521 status_update_manager.cpp:320] Received status
>> update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518-
>> I1002 20:44:43.708709  1521 status_update_manager.cpp:373] Forwarding status
>> update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518- to master@10.2.0.182:5050
>> I1002 20:44:43.728991  1526 status_update_manager.cpp:398] Received status
>> update acknowledgement (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518-
>> I1002 20:47:05.904324  1527 slave.cpp:2538] Monitoring executor
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
>> '20140919-224934-1593967114-5050-1518-' in container
>> '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f'
>> I1002 20:47:06.311027  1525 slave.cpp:1733] Got registration for executor
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
>> 20140919-224934-1593967114-5050-1518- from executor(1)@10.2.1.34:29920
>> 
>> I'll typically see a barrage of these in association with a Marathon app
>> update (which deploys new tasks). Eventually, one container "sticks" and we
>> get a RUNNING task instead of a KILLED one.
>> 
>> Where else can I look?


Re: Docker Example Mesos 0.20?

2014-08-27 Thread Connor Doyle
Hi Eran, that's correct. Mesos supports multiple containerizers now.  The order 
they are listed is significant; as listed the Docker containerizer will pass on 
the TaskInfo if the ContainerInfo is not set or if the container type is not 
DOCKER.
--
Connor

> On Aug 27, 2014, at 9:09, Eran Chinthaka Withana  
> wrote:
> 
> Thanks Frank for these instructions. I will have to wait for marathon release 
> to use this (hopefully that will happen soon)
> 
> A n00b question from me here. I noticed that we can now set 
> "--containerizers=docker,mesos". Does this mean mesos slaves will now support 
> both docker type and old containers? If we don't mention "container" section 
> in the marathon request[1], will it work using standard lxc?
> 
> 
> {
> 
>   "container": {
> 
> "type": "DOCKER",
> 
> "docker": {
> 
>   "image": "libmesos/ubuntu"
> 
> }
> 
>   },
> 
>   "id": "ubuntu",
> 
>   "instances": "1",
> 
>   "cpus": "0.5",
> 
>   "mem": "128",
> 
>   "uris": [],
> 
>   "cmd": "while sleep 10; do date -u +%T; done"
> 
> }
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
>> On Tue, Aug 26, 2014 at 11:06 PM, Frank Hinek  wrote:
>> Working here as well.  Thanks for the assist Tim!
>> 
>> Put together a post on the steps for my own reference: 
>> http://frankhinek.com/deploy-docker-containers-on-mesos-0-20/
>> 
>> 
>>> On August 26, 2014 at 4:39:38 PM, Ray Rodriguez (rayrod2...@gmail.com) 
>>> wrote:
>>> 
>>> Thanks Tim works great.  Cheers!
>>> 
>>> 
 On Tue, Aug 26, 2014 at 4:31 PM, Tim Chen  wrote:
 Hi Ray,
 
 Sorry the tutorial is not yet up to date too, once we have Marathon 0.7 
 released the tutorial will be updated as well.
 
 Here is one example for running the image:
 
 {
 
 "id": "inky", 
 
 "container": {
 
 "docker": {
 
 "image": "mesosphere/inky"
 
 },
 
 "type": "DOCKER",
 
 "volumes": []
 
 },
 
 "args": ["hello"],
 
 "cpus": 0.2,
 
 "mem": 32.0,
 
 "instances": 1
 
 }
 
 
 
 You can also provide a "cmd" string as well.
 
 
 
 Tim
 
 
 
> On Tue, Aug 26, 2014 at 11:28 AM, Ray Rodriguez  
> wrote:
> I'm running marathon HEAD 0.7.0 against mesos 0.20.0.
> 
> My mesos slaves are running with the command line flag 
> --containerizers=docker,mesos and --isolation=cgroups/cpu,cgroups/mem
> 
> When trying to run the example listed here: 
> http://mesosphere.io/learn/run-docker-on-mesosphere-cluster/ I get the 
> following in the sandbox stderr/stdout
> 
> stdout:
> 
> Shutting down
> 
> stderr:
> 
> I0826 18:12:48.983397 28817 exec.cpp:132] Version: 0.20.0 I0826 
> 18:12:48.985131 28843 exec.cpp:379] Executor asked to shutdown
> 
> 
> 
> 
>> On Tue, Aug 26, 2014 at 2:15 PM, Frank Hinek  
>> wrote:
>> Thanks for the tip!  Building Marathon from latest master at the moment 
>> to test.
>> 
>> 
>> 
>> 
>>> On August 26, 2014 at 1:47:20 PM, Tim Chen (t...@mesosphere.io) wrote:
>>> 
>>> Hi Frank,
>>> 
>>> Yes you need Marathon 0.7 which we are working on to release soon.
>>> 
>>> In the mean time if you want you can grab latest master to experiment 
>>> with.
>>> 
>>> Thanks!
>>> 
>>> Tim
>>> 
>>> 
 On Tue, Aug 26, 2014 at 10:41 AM, Frank Hinek  
 wrote:
 I did run through that example but it fails every time.  Perhaps it is 
 because Marathon 0.6.1 doesn’t yet support the new capabilities in 
 Mesos 0.20.0.
 
 curl -X POST -H "Content-Type: application/json" 
 http://127.0.0.1:8080/v2/apps -d...@docker.json
 nullvagrant@vagrant-ubuntu-trusty-64:/tmp$ I0826 17:23:25.071254  1742 
 slave.cpp:1002] Got assigned task 
 ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 for framework 
 20140826-170643-251789322-5050-1532-
 I0826 17:23:25.072319  1742 slave.cpp:1112] Launching task 
 ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 for framework 
 20140826-170643-251789322-5050-1532-
 I0826 17:23:25.073552  1736 docker.cpp:782] No container info found, 
 skipping launch
 I0826 17:23:25.074030  1742 slave.cpp:1222] Queuing task 
 'ubuntu.afa18986-2d45-11e4-8e47-56847afe9799' for executor 
 ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 of framework 
 '20140826-170643-251789322-5050-1532-
 E0826 17:23:25.074518  1742 slave.cpp:2491] Container 
 '01966efd-f521-4f54-87e4-f84aa9adcfa9' for executor 
 'ubuntu.afa18986-2d45-11e4-8e47-56847afe9799' of framework 
 '20140826-170643-251789322-5050-1532-' failed to start: 
 TaskInfo/ExecutorInfo not supported
 E0826 17:23:

Re: Docker Example Mesos 0.20?

2014-08-27 Thread Connor Doyle
See also Docker/Mesos/Marathon this doc, hot off the presses:
https://mesosphere.github.io/marathon/docs/native-docker.html
--
Connor

On Tue, Aug 26, 2014 at 11:06 PM, Frank Hinek  wrote:
> Working here as well.  Thanks for the assist Tim!
>
> Put together a post on the steps for my own reference:
> http://frankhinek.com/deploy-docker-containers-on-mesos-0-20/
>
>
> On August 26, 2014 at 4:39:38 PM, Ray Rodriguez (rayrod2...@gmail.com)
> wrote:
>
> Thanks Tim works great.  Cheers!
>
>
> On Tue, Aug 26, 2014 at 4:31 PM, Tim Chen  wrote:
>>
>> Hi Ray,
>>
>> Sorry the tutorial is not yet up to date too, once we have Marathon 0.7
>> released the tutorial will be updated as well.
>>
>> Here is one example for running the image:
>>
>> {
>>
>> "id": "inky",
>>
>> "container": {
>>
>> "docker": {
>>
>> "image": "mesosphere/inky"
>>
>> },
>>
>> "type": "DOCKER",
>>
>> "volumes": []
>>
>> },
>>
>> "args": ["hello"],
>>
>> "cpus": 0.2,
>>
>> "mem": 32.0,
>>
>> "instances": 1
>>
>> }
>>
>>
>> You can also provide a "cmd" string as well.
>>
>>
>> Tim
>>
>>
>>
>> On Tue, Aug 26, 2014 at 11:28 AM, Ray Rodriguez 
>> wrote:
>>>
>>> I'm running marathon HEAD 0.7.0 against mesos 0.20.0.
>>>
>>> My mesos slaves are running with the command line flag
>>> --containerizers=docker,mesos and --isolation=cgroups/cpu,cgroups/mem
>>>
>>> When trying to run the example listed here:
>>> http://mesosphere.io/learn/run-docker-on-mesosphere-cluster/ I get the
>>> following in the sandbox stderr/stdout
>>>
>>> stdout:
>>>
>>> Shutting down
>>>
>>> stderr:
>>>
>>> I0826 18:12:48.983397 28817 exec.cpp:132] Version: 0.20.0 I0826
>>> 18:12:48.985131 28843 exec.cpp:379] Executor asked to shutdown
>>>
>>>
>>>
>>>
>>> On Tue, Aug 26, 2014 at 2:15 PM, Frank Hinek 
>>> wrote:

 Thanks for the tip!  Building Marathon from latest master at the moment
 to test.




 On August 26, 2014 at 1:47:20 PM, Tim Chen (t...@mesosphere.io) wrote:

 Hi Frank,

 Yes you need Marathon 0.7 which we are working on to release soon.

 In the mean time if you want you can grab latest master to experiment
 with.

 Thanks!

 Tim


 On Tue, Aug 26, 2014 at 10:41 AM, Frank Hinek 
 wrote:
>
> I did run through that example but it fails every time.  Perhaps it is
> because Marathon 0.6.1 doesn’t yet support the new capabilities in Mesos
> 0.20.0.
>
> curl -X POST -H "Content-Type: application/json"
> http://127.0.0.1:8080/v2/apps -d...@docker.json
> nullvagrant@vagrant-ubuntu-trusty-64:/tmp$ I0826 17:23:25.071254  1742
> slave.cpp:1002] Got assigned task
> ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 for framework
> 20140826-170643-251789322-5050-1532-
> I0826 17:23:25.072319  1742 slave.cpp:1112] Launching task
> ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 for framework
> 20140826-170643-251789322-5050-1532-
> I0826 17:23:25.073552  1736 docker.cpp:782] No container info found,
> skipping launch
> I0826 17:23:25.074030  1742 slave.cpp:1222] Queuing task
> 'ubuntu.afa18986-2d45-11e4-8e47-56847afe9799' for executor
> ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 of framework
> '20140826-170643-251789322-5050-1532-
> E0826 17:23:25.074518  1742 slave.cpp:2491] Container
> '01966efd-f521-4f54-87e4-f84aa9adcfa9' for executor
> 'ubuntu.afa18986-2d45-11e4-8e47-56847afe9799' of framework
> '20140826-170643-251789322-5050-1532-' failed to start:
> TaskInfo/ExecutorInfo not supported
> E0826 17:23:25.074937  1742 slave.cpp:2577] Termination of executor
> 'ubuntu.afa18986-2d45-11e4-8e47-56847afe9799' of framework
> '20140826-170643-251789322-5050-1532-' failed: No container found
> E0826 17:23:25.075564  1742 slave.cpp:2863] Failed to unmonitor
> container for executor ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 of
> framework 20140826-170643-251789322-5050-1532-: Not monitored
> I0826 17:23:25.076370  1742 slave.cpp:2087] Handling status update
> TASK_FAILED (UUID: 0da7c07d-aeb3-4aa3-a457-0dfcf0243914) for task
> ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 of framework
> 20140826-170643-251789322-5050-1532- from @0.0.0.0:0
> E0826 17:23:25.076938  1742 slave.cpp:2204] Failed to update resources
> for container 01966efd-f521-4f54-87e4-f84aa9adcfa9 of executor
> ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 running task
> ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 on status update for terminal
> task, destroying container: No container found
> I0826 17:23:25.077309  1737 status_update_manager.cpp:320] Received
> status update TASK_FAILED (UUID: 0da7c07d-aeb3-4aa3-a457-0dfcf0243914) for
> task ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 of framework
> 20140826-170643-251789322-5050-1532-
> I0826 17:23:25.077424  1737 stat

Re: Alternate HDFS Filesystems + Hadoop on Mesos

2014-08-24 Thread Connor Doyle

> Also, fwiw I'm interested in rallying folks on a Tachyon Framework in the 
> not-too-distant future, for anyone who is interested.  Probably follow the 
> spark model and try to push upstream.   

Hi Tim, late follow-up:

The not-too distant future is here!  Adam and I took a stab at a Tachyon 
framework during the MesosCon hackathon 
(http://github.com/mesosphere/tachyon-mesos).
We started writing in Scala, but not at all opposed to switching to Java, 
especially if the work can be upstreamed.
--
Connor

> 
> 
>> On Fri, Aug 15, 2014 at 5:16 PM, John Omernik  wrote:
>> I tried hdfs:/// and hdfs://cldbnode:7222/ Neither worked (examples below) I 
>> really think the hdfs vs other prefixes should be looked at. Like I said 
>> above, the tachyon project just added a env variable to address this.  
>> 
>> 
>> 
>> hdfs://cldbnode:7222/
>> WARNING: Logging before InitGoogleLogging() is written to STDERR
>> I0815 19:14:17.101666 22022 fetcher.cpp:76] Fetching URI 
>> 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
>> I0815 19:14:17.101780 22022 fetcher.cpp:105] Downloading resource from 
>> 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
>> '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
>> E0815 19:14:17.778833 22022 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
>> fs -copyToLocal 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
>> '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
>> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
>> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
>> files.
>> -copyToLocal: Wrong FS: 
>> maprfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: 
>> hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
>> Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
>>  ... 
>> Failed to fetch: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
>> Failed to synchronize with slave (it's probably exited)
>> 
>> 
>> 
>> hdfs:/// 
>> 
>> 
>> I0815 19:10:45.006803 21508 fetcher.cpp:76] Fetching URI 
>> 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
>> I0815 19:10:45.007099 21508 fetcher.cpp:105] Downloading resource from 
>> 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
>> '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
>> E0815 19:10:45.681922 21508 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
>> fs -copyToLocal 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
>> '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
>> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
>> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
>> files.
>> -copyToLocal: Wrong FS: maprfs:/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, 
>> expected: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
>> Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
>>  ... 
>> Failed to fetch: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
>> Failed to synchronize with slave (it's probably exited)
>> 
>> 
>>> On Fri, Aug 15, 2014 at 5:38 PM, John Omernik  wrote:
>>> I am away from my cluster right now, I trued doing a hadoop fs -ls 
>>> maprfs:// and that worked.   When I tries hadoop fs -ls hdfs:/// it failed 
>>> with wrong fs type.  With that error I didn't try it in the mapred-site.  I 
>>> will try it.  Still...why hard code the file prefixes? I guess I am curious 
>>> on how glusterfs would work, or others as they pop up. 
>>> 
 On Aug 15, 2014 5:04 PM, "Adam Bordelon"  wrote:
 Can't you just use the hdfs:// protocol for maprfs? That should work just 
 fine.
 
 
> On Fri, Aug 15, 2014 at 2:50 PM, John Omernik  wrote:
> Thanks all.
> 
> I realized MapR has a work around for me that I will try soon in that I 
> have MapR fs NFS mounted on each node, I.e. I should be able to get the 
> tar from there.
> 
> That said, perhaps someone with better coding skills than me could 
> provide an env variable where a user could provide the HDFS prefixes to 
> try. I know we did that with the tachyon project and it works well for 
> other HDFS compatible fs implementations, perhaps that would work here?  
> Hard coding a pluggable system seems like a long term issue that will 
> keep coming up.
> 

Re: Service Discovery with Marathon and HAProxy

2014-08-20 Thread Connor Doyle
Thanks for sharing Bart!
Will definitely take this for a spin.
--
Connor

> On Aug 19, 2014, at 8:09, Bart Spaans  wrote:
> 
> Hi everyone, 
> 
> I've just released a project that might be of interest to some of you. 
> 
> It can be used to automatically reload HAProxy configurations when something 
> in Marathon changes so that traffic is proxied to the right ports on the 
> right hosts as soon as tasks are started and stopped. 
> 
> It beats a solution in cron or DNS because the changes are instant and 
> downtime is minimised. 
> 
> The project is in a fairly infant stage, but hopefully already useful to some 
> -- pull requests are welcome though!
> 
> https://github.com/opencredo/mesos_service_discovery
> 
> Kind regards, 
> Bart Spaans 
> 
> @Work_of_Bart
> 


Re: Mesos(phere) chef cookbook

2014-01-16 Thread Connor Doyle
This is great news!
Thanks Ray,
--
Connor Doyle
http://mesosphere.io

> On Jan 16, 2014, at 4:53, Ray Rodriguez  wrote:
> 
> I open sourced the mesos cookbook used at my company a few weeks ago.  This 
> mesos cookbook focuses on installing mesos via mesospheres packages and 
> currently supports the deb packages with rpm support to come.  It's also 
> intended to work well with some other cookbooks we are hoping to open source 
> in the next week or so for the Marathon and Chronos frameworks.
> 
> It also has options for zookeeper discovery via the netflix exhibitor 
> cookbook which we use extensively for all of our zookeeper/exhibitor 
> infrastructure.
> 
> I hope to bring the cookbook up to feature parity with the latest versions of 
> mesos as they are packaged by mesosphere and add good test coverage via 
> serverspec/chefspec and test-kitchen.
> 
> https://github.com/mdsol/mesos_cookbook
> 
> Thanks.
> 
> Ray Rodriguez
> 
> Cloud Engineer
> Medidata Solutions