Re: Executors no longer inherit environment variables from the agent

2016-03-10 Thread Connor Doyle
Rodrick, in your case those environment variables are set by the
framework as part of the TaskInfo, so those shouldn't be affected by
the change.

On Thu, Mar 10, 2016 at 10:38 AM, Rodrick Brown  wrote:
> This is unfortunate we are using environment variables that get passed into
> the executors context such as
>
> CHRONOS_RESOURCE_MEM
> MARATHON_APP_RESOURCE_MEM
>
> What will be the workaround?
>
> --
>
> Rodrick Brown / Systems Engineer
>
> +1 917 445 6839 / rodr...@orchardplatform.com
>
> Orchard Platform
>
> 101 5th Avenue, 4th Floor, New York, NY 10003
>
> http://www.orchardplatform.com
>
> Orchard Blog | Marketplace Lending Meetup
>>
>> On Mar 8 2016, at 2:33 pm, Gilbert Song  wrote:
>>
>> Hi,
>>
>>
>> TL;DR Executors will no longer inherit environment variables from the
>> agent by default in 0.30.
>>
>>
>> Currently, executors are inheriting environment variables form the agent
>> in mesos containerizer by default. This is an unfortunate legacy behavior
>> and is insecure. If you do have environment variables that you want to pass
>> to the executors, you can set it explicitly by using the
>> `--executor_environment_variables` agent flag.
>>
>>
>> Starting from 0.30, we will no longer allow executors to inherit
>> environment variables from the agent. In other words,
>> `--executor_environment_variables` will be set to “{}” by default. If you do
>> depend on the original behavior, please set
>> `--executor_environment_variables` flag explicitly.
>>
>>
>> Let us know if you have any comments or concerns.
>>
>>
>> Thanks,
>>
>> Gilbert
>
>
> NOTICE TO RECIPIENTS: This communication is confidential and intended for
> the use of the addressee only. If you are not an intended recipient of this
> communication, please delete it immediately and notify the sender by return
> email. Unauthorized reading, dissemination, distribution or copying of this
> communication is prohibited. This communication does not constitute an offer
> to sell or a solicitation of an indication of interest to purchase any loan,
> security or any other financial product or instrument, nor is it an offer to
> sell or a solicitation of an indication of interest to purchase any products
> or services to any persons who are prohibited from receiving such
> information under applicable law. The contents of this communication may not
> be accurate or complete and are subject to change without notice. As such,
> Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard")
> makes no representation regarding the accuracy or completeness of the
> information contained herein. The intended recipient is advised to consult
> its own professional advisors, including those specializing in legal, tax
> and accounting matters. Orchard does not provide legal, tax or accounting
> advice.



-- 
connor


Re: "Chaos monkey" for mesos?

2016-02-25 Thread Connor Doyle
There's no way to kill a single task through the Mesos control
surfaces, but if you let the "chaos" framework launch tasks as a
privileged user, you can run wild.

On Thu, Feb 25, 2016 at 2:49 PM, Srikanth Viswanathan
<srikant...@gmail.com> wrote:
> Sorry, ignore my first question. A framework can obviously kill tasks. I was
> just unsure as to whether it can kill foreign tasks, which leaves only my
> second question.
>
> On Thu, Feb 25, 2016 at 5:23 PM, Srikanth Viswanathan <srikant...@gmail.com>
> wrote:
>>
>> Appreciate all the responses here. I'll look into `mesos-execute`.
>>
>> I was thinking about the framework idea in passing but my mesos knowledge
>> isn't up to scratch yet, so I haven't been able pursue it yet. There are
>> many questions in my mind w.r.t designing this as a framework:
>> * Doesn't a framework only receive offers from mesos and launch tasks? How
>> would a framework kill tasks? Can it also kill slaves?
>> * Is it legal in mesos for one framework to kill tasks belonging to
>> another framework?
>>
>> Thanks.
>> Srikanth
>>
>> On Thu, Feb 25, 2016 at 4:58 PM, Connor Doyle <connor@gmail.com>
>> wrote:
>>>
>>> I think you could approximate that tool's behavior with some scripting
>>> plus `mesos-execute` (ships with the distribution) or by writing a
>>> really simple framework that just turns things off.
>>>
>>> On Thu, Feb 25, 2016 at 1:14 PM, Srikanth Viswanathan
>>> <srikant...@gmail.com> wrote:
>>> > Thanks. Craig and David. I'm curious about the design and use of that
>>> > tool.
>>> > Based on the video, it looks close to what I hope to do.
>>> >
>>> > A web search didn't yield any results about it, however. Does anyone
>>> > here
>>> > know more about the dcos chaos tool?
>>> >
>>> > Thanks again.
>>> > Srikanth
>>> >
>>> > On Thu, Feb 25, 2016 at 12:21 PM, craig w <codecr...@gmail.com> wrote:
>>> >>
>>> >> here's a direct link in the video
>>> >> https://youtu.be/0I6qG9RQUnY?t=389
>>> >>
>>> >> On Thu, Feb 25, 2016 at 12:17 PM, David Wood <daw...@us.ibm.com>
>>> >> wrote:
>>> >>>
>>> >>> The DCOS tutorial mentions a chaos tool at the end of the video.  Not
>>> >>> sure if that's what your looking for, but it might be something to
>>> >>> follow up
>>> >>> on somehow.
>>> >>>
>>> >>> https://mesosphere.com/learn/
>>> >>>
>>> >>> David Wood
>>> >>> Computing Systems for Wireless Networks
>>> >>> IBM TJ Watson Research Center
>>> >>> daw...@us.ibm.com
>>> >>> 914-945-4923 (office), 914-396-6515 (mobile)
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> From:Srikanth Viswanathan <srikant...@gmail.com>
>>> >>> To:user@mesos.apache.org
>>> >>> Date:02/25/2016 12:01 PM
>>> >>> Subject:"Chaos monkey" for mesos?
>>> >>> 
>>> >>>
>>> >>>
>>> >>>
>>> >>> Has there been any work done to develop a "chaos monkey" analogue for
>>> >>> Mesos? I have been researching on how to write one, but I wanted to
>>> >>> know if
>>> >>> there's any work already available that I can take a look at for
>>> >>> comparison,
>>> >>> and possibly re-use.
>>> >>>
>>> >>> The end goal would be something loaded into Mesos or separate from
>>> >>> Mesos
>>> >>> that randomly kills tasks. Could it be something as simple as an
>>> >>> application
>>> >>> that uses the KILL HTTP request from the scheduler API to kill tasks?
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>> Srikanth
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> https://github.com/mindscratch
>>> >> https://www.google.com/+CraigWickesser
>>> >> https://twitter.com/mind_scratch
>>> >> https://twitter.com/craig_links
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> connor
>>
>>
>



-- 
connor


Re: "Chaos monkey" for mesos?

2016-02-25 Thread Connor Doyle
I think you could approximate that tool's behavior with some scripting
plus `mesos-execute` (ships with the distribution) or by writing a
really simple framework that just turns things off.

On Thu, Feb 25, 2016 at 1:14 PM, Srikanth Viswanathan
 wrote:
> Thanks. Craig and David. I'm curious about the design and use of that tool.
> Based on the video, it looks close to what I hope to do.
>
> A web search didn't yield any results about it, however. Does anyone here
> know more about the dcos chaos tool?
>
> Thanks again.
> Srikanth
>
> On Thu, Feb 25, 2016 at 12:21 PM, craig w  wrote:
>>
>> here's a direct link in the video
>> https://youtu.be/0I6qG9RQUnY?t=389
>>
>> On Thu, Feb 25, 2016 at 12:17 PM, David Wood  wrote:
>>>
>>> The DCOS tutorial mentions a chaos tool at the end of the video.  Not
>>> sure if that's what your looking for, but it might be something to follow up
>>> on somehow.
>>>
>>> https://mesosphere.com/learn/
>>>
>>> David Wood
>>> Computing Systems for Wireless Networks
>>> IBM TJ Watson Research Center
>>> daw...@us.ibm.com
>>> 914-945-4923 (office), 914-396-6515 (mobile)
>>>
>>>
>>>
>>>
>>> From:Srikanth Viswanathan 
>>> To:user@mesos.apache.org
>>> Date:02/25/2016 12:01 PM
>>> Subject:"Chaos monkey" for mesos?
>>> 
>>>
>>>
>>>
>>> Has there been any work done to develop a "chaos monkey" analogue for
>>> Mesos? I have been researching on how to write one, but I wanted to know if
>>> there's any work already available that I can take a look at for comparison,
>>> and possibly re-use.
>>>
>>> The end goal would be something loaded into Mesos or separate from Mesos
>>> that randomly kills tasks. Could it be something as simple as an application
>>> that uses the KILL HTTP request from the scheduler API to kill tasks?
>>>
>>> Thanks.
>>>
>>> Srikanth
>>>
>>
>>
>>
>> --
>>
>> https://github.com/mindscratch
>> https://www.google.com/+CraigWickesser
>> https://twitter.com/mind_scratch
>> https://twitter.com/craig_links
>
>



-- 
connor


Re: Recommended way to discover current master

2015-08-31 Thread Connor Doyle
It's also worth noting the existence of the `mesos-resolve` binary, which can 
turn a canonical Mesos ZK string into the leading master location.
--
Connor


> On Aug 31, 2015, at 10:39, Marco Massenzio  wrote:
> 
> The easiest way is via accessing directly Zookeeper - as you don't need to 
> know a priori the list of Masters; if you do, however, hitting any one of 
> them will redirect (302) to the current Leader.
> 
> If you would like to see an example of how to retrieve that info from ZK, I 
> have written about it here[0].
> Finally, we're planning to make all this available via the Mesos Commons[1] 
> library (currently, there is a PR[2] waiting to be be merged).
> 
> 
> [0] 
> http://codetrips.com/2015/08/16/apache-mesos-leader-master-discovery-using-zookeeper-part-2/
> [1] https://github.com/mesos/commons
> [2] https://github.com/mesos/commons/pull/2/files
> 
> Marco Massenzio
> Distributed Systems Engineer
> http://codetrips.com
> 
> On Mon, Aug 31, 2015 at 10:25 AM, Philip Weaver  
> wrote:
> My framework knows the list of zookeeper hosts and the list of mesos master 
> hosts.
> 
> I can think of a few ways for the framework to figure out which host is the 
> current master. What would be the best? Should I check in zookeeper directly? 
> Does the mesos library expose an interface to discover the master from 
> zookeeper or otherwise? Should I just try each possible master until one 
> responds?
> 
> Apologies if this is already well documented, but I wasn't able to find it. 
> Thanks!
> 
> - Philip
> 
> 



Re: Custom executor

2015-07-29 Thread Connor Doyle
You don't even have to pre-load the executor on the slave boxes -- just add it 
as a URL and it will be downloaded to the sandbox like any other resource!

 On Jul 29, 2015, at 02:47, Aaron Carey aca...@ilm.com wrote:
 
 Ah I see.. so is it simply a case of making the executor file executable, 
 putting it on the slave, and supplying the path to it in the JSON?
 
 Thanks!
 
 Aaron
 
 From: Ondrej Smola [ondrej.sm...@gmail.com]
 Sent: 29 July 2015 10:13
 To: user@mesos.apache.org
 Subject: Re: Custom executor
 
 Hi Aaron,
 
 custom executor should be supported by Marathon - i dont use it but from 
 tests in 
 
 https://github.com/mesosphere/marathon/blob/master/src/test/scala/mesosphere/mesos/TaskBuilderTest.scala#L236
 
 there is a option to specify path to custom executor. 
 
 https://mesosphere.github.io/marathon/docs/rest-api.html#post-/v2/apps
 
 in task definition there is executor json prop
 
 Chronos also supports this property
 
 
 Download/create some simple executor and try to test it.
 
 
 
 
 2015-07-29 11:00 GMT+02:00 Aaron Carey aca...@ilm.com:
 Hi Tim,
 
 We have some specific requirements for moving data around when executing 
 tasks on slaves, I want to be able to 'check out' a selection of files, and 
 possibly mount filesystems onto the slave (and subsequently into the 
 executing docker container). The data required  by each task is specified in 
 our database.
 
 Basically I wanted to customise an executor to prepare the data on the slave 
 before executing the docker container, rather than having to get the 
 container to download its own data or attempt to mount NFS volumes itself.
 
 I hope that all makes sense, I couldn't find a simple solution to this using 
 the existing architecture.. I'd love to know your thoughts though!
 
 Thanks,
 Aaron
 
 From: Tim Chen [t...@mesosphere.io]
 Sent: 28 July 2015 19:01
 To: user@mesos.apache.org
 Subject: Re: Custom executor
 
 Can you explain what your motivations are and what your new custom executor 
 will do?
 
 Tim
 
 On Tue, Jul 28, 2015 at 5:08 AM, Aaron Carey aca...@ilm.com wrote:
 Hi,
 
 Is it possible to build a custom executor which is not associated with a 
 particular scheduler framework? I want to be able to write a custom 
 executor which is available to multiple schedulers (eg Marathon, Chronos 
 and our own custom scheduler). Is this possible? I couldn't quite figure 
 out the best way to go about this from the docs? Is it possible to mix and 
 match languages for schedulers and executors? (ie one is python one is C++)
 
 Thanks,
 Aaron
 


Re: Custom executor

2015-07-29 Thread Connor Doyle
In Marathon the executor ID is unique every time, so tasks and executors will 
be in 1:1 correspondence. More generally, if you re-use the executor info 
message when launching tasks, a single executor can handle multiple tasks 
simultaneously.

 On Jul 29, 2015, at 09:43, Aaron Carey aca...@ilm.com wrote:
 
 ah cool! Will that run as one instance per task, or one scheduler per slave?
 
 
 From: Connor Doyle [connor@gmail.com]
 Sent: 29 July 2015 17:24
 To: user@mesos.apache.org
 Subject: Re: Custom executor
 
 You don't even have to pre-load the executor on the slave boxes -- just add 
 it as a URL and it will be downloaded to the sandbox like any other resource!
 
 On Jul 29, 2015, at 02:47, Aaron Carey aca...@ilm.com wrote:
 
 Ah I see.. so is it simply a case of making the executor file executable, 
 putting it on the slave, and supplying the path to it in the JSON?
 
 Thanks!
 
 Aaron
 
 From: Ondrej Smola [ondrej.sm...@gmail.com]
 Sent: 29 July 2015 10:13
 To: user@mesos.apache.org
 Subject: Re: Custom executor
 
 Hi Aaron,
 
 custom executor should be supported by Marathon - i dont use it but from 
 tests in 
 
 https://github.com/mesosphere/marathon/blob/master/src/test/scala/mesosphere/mesos/TaskBuilderTest.scala#L236
 
 there is a option to specify path to custom executor. 
 
 https://mesosphere.github.io/marathon/docs/rest-api.html#post-/v2/apps
 
 in task definition there is executor json prop
 
 Chronos also supports this property
 
 
 Download/create some simple executor and try to test it.
 
 
 
 
 2015-07-29 11:00 GMT+02:00 Aaron Carey aca...@ilm.com:
 Hi Tim,
 
 We have some specific requirements for moving data around when executing 
 tasks on slaves, I want to be able to 'check out' a selection of files, and 
 possibly mount filesystems onto the slave (and subsequently into the 
 executing docker container). The data required by each task is specified in 
 our database.
 
 Basically I wanted to customise an executor to prepare the data on the 
 slave before executing the docker container, rather than having to get the 
 container to download its own data or attempt to mount NFS volumes itself.
 
 I hope that all makes sense, I couldn't find a simple solution to this 
 using the existing architecture.. I'd love to know your thoughts though!
 
 Thanks,
 Aaron
 
 From: Tim Chen [t...@mesosphere.io]
 Sent: 28 July 2015 19:01
 To: user@mesos.apache.org
 Subject: Re: Custom executor
 
 Can you explain what your motivations are and what your new custom executor 
 will do?
 
 Tim
 
 On Tue, Jul 28, 2015 at 5:08 AM, Aaron Carey aca...@ilm.com wrote:
 Hi,
 
 Is it possible to build a custom executor which is not associated with a 
 particular scheduler framework? I want to be able to write a custom 
 executor which is available to multiple schedulers (eg Marathon, Chronos 
 and our own custom scheduler). Is this possible? I couldn't quite figure 
 out the best way to go about this from the docs? Is it possible to mix and 
 match languages for schedulers and executors? (ie one is python one is C++)
 
 Thanks,
 Aaron


Re: [DISCUSS] Renaming Mesos Slave

2015-06-01 Thread Connor Doyle
+1

1. Mesos Worker [node/host/machine]
2. Mesos Worker [process]
3. No, master/worker seems to address the issue with less changes.
4. Begin using the new name ASAP, add a disambiguation to the docs, and change 
old references over time.  Fixing the official name, even before changes are 
in place, would be a good first step.

--
Connor


 On Jun 1, 2015, at 14:18, Adam Bordelon a...@mesosphere.io wrote:
 
 There has been much discussion about finding a less offensive name than 
 Slave, and many of these thoughts have been captured in 
 https://issues.apache.org/jira/browse/MESOS-1478
 
 I would like to open up the discussion on this topic for one week, and if we 
 cannot arrive at a lazy consensus, I will draft a proposal from the 
 discussion and call for a VOTE.
 Here are the questions I would like us to answer:
 1. What should we call the Mesos Slave node/host/machine?
 2. What should we call the mesos-slave process (could be the same)?
 3. Do we need to rename Mesos Master too?
 
 Another topic worth discussing is the deprecation process, but we don't 
 necessarily need to decide on that at the same time as deciding the new 
 name(s).
 4. How will we phase in the new name and phase out the old name?
 
 Please voice your thoughts and opinions below.
 
 Thanks!
 -Adam-
 
 P.S. My personal thoughts:
 1. Mesos Worker [Node]
 2. Mesos Worker or Agent
 3. No
 4. Carefully



Re: CPU resource allocation: ignore?

2015-03-11 Thread Connor Doyle
If you don't care at all about accounting usage of that resource then you 
should be able to set it to 0.0.  As Ian mentioned, this won't be enforced with 
the cpu isolator disabled.
--
Connor

 On Mar 11, 2015, at 08:43, Ian Downes idow...@twitter.com wrote:
 
 The --isolation flag for the slave determines how resources are *isolated*, 
 i.e., by not specifying any cpu isolator there will be no isolation between 
 executors for cpu usage; the Linux scheduler will try to balance their 
 execution.
 
 Cpu and memory are considered required resources for executors and I believe 
 the master enforces this.
 
 What are behavior are you trying to achieve? If your jobs don't require much 
 cpu then can you not just set a small value, like 0.25 cpu?
 
 On Wed, Mar 11, 2015 at 7:20 AM, Geoffroy Jabouley 
 geoffroy.jabou...@gmail.com wrote:
 Hello
 
 As cpu relatives shares are *not very* relevant in our heterogenous cluster, 
 we would like to get rid of CPU resources management and only use MEM 
 resources for our cluster and tasks allocation.
 
 Even when modifying the isolation flag of our slave to 
 --isolation=cgroups/mem, we see these in the logs:
 
 from the slave, at startup:
 I0311 15:09:55.006750 50906 slave.cpp:289] Slave resources: 
 ports(*):[31000-32000, 80-443]; cpus(*):2; mem(*):1979; disk(*):22974
 
 from the master:
 I0311 15:15:16.764714 50884 hierarchical_allocator_process.hpp:563] 
 Recovered ports(*):[31000-32000, 80-443]; cpus(*):2; mem(*):1979; 
 disk(*):22974 (total allocatable: ports(*):[31000-32000, 80-443]; cpus(*):2; 
 mem(*):1979; disk(*):22974) on slave 
 20150311-150951-3982541578-5050-50860-S0 from framework 
 20150311-150951-3982541578-5050-50860-
 
 And mesos master UI is showing both CPU and MEM resources status.
 
 
 
 Btw, we are using Marathon and Jenkins frameworks to start our mesos tasks, 
 and the cpus field seems mandatory (set to 1.0 by default). So i guess you 
 cannot easily bypass cpu resources allocation...
 
 
 Any idea?
 Regards
 
 2015-02-19 15:15 GMT+01:00 Ryan Thomas r.n.tho...@gmail.com:
 Hey Don,
 
 Have you tried only setting the 'cgroups/mem' isolation flag on the slave 
 and not the cpu one? 
 
 http://mesosphere.com/docs/reference/mesos-slave/
 
 
 ryan
 
 On 19 February 2015 at 14:13, Donald Laidlaw donlaid...@me.com wrote:
 I am using Mesos 0.21.1 with Marathon 0.8.0 and running everything in 
 docker containers.
 
 Is there a way to have mesos ignore the cpu relative shares? That is, not 
 limit the docker container CPU at all when it runs. I would still want to 
 have the Memory resource limitation, but would rather just let the linux 
 system under the containers schedule all the CPU.
 
 This would allow us to just allocate tasks to mesos slaves based on 
 available memory only, and to let those tasks get whatever CPU they could 
 when they needed it. This is desireable where there can be lots of 
 relative high memory tasks that have very low CPU requirements. Especially 
 if we do not know the capabilities of the slave machines with regards to 
 CPU. Some of them may have fast CPU's, some slow, so it is hard to pick a 
 relative number for that slave.
 
 Thanks,
 
 Don Laidlaw
 


Re: Question about External Containerizer

2014-12-03 Thread Connor Doyle
You're right Sharma, it's dependent upon the framework.  If your scheduler sets 
a unique ExecutorID for each TaskInfo, then the executor will not be re-used 
and you won't have to worry about resizing the executor's container to 
accomodate subsequent tasks.  This might be a reasonable simplification to 
start with, especially if your executor adds relatively low resource overhead.
--
Connor


 On Dec 3, 2014, at 10:20, Sharma Podila spod...@netflix.com wrote:
 
 This may have to do with fine-grain Vs coarse-grain resource allocation. 
 Things may be easier for you, Diptanu, if you are using one Docker container 
 per task (sort of coarse grain). In that case, I believe there's no need to 
 alter a running Docker container's resources. Instead, the resource update of 
 your executor translates into the right Docker containers running. There's 
 some details to be worked out there, I am sure. 
 It sounds like Tom's strategy uses the same Docker container for multiple 
 tasks. Tom, do correct me otherwise.
 
 On Wed, Dec 3, 2014 at 3:38 AM, Tom Arnfeld t...@duedil.com wrote:
 When Mesos is asked to a launch a task (with either a custom Executor or the 
 built in CommandExecutor) it will first spawn the executor which _has_ to be 
 a system process, launched via command. This process will be launched inside 
 of a Docker container when using the previously mentioned containerizers.
 
 Once the Executor registers with the slave, the slave will send it a number 
 of launchTask calls based on the number of tasks queued up for that executor. 
 The Executor can then do as it pleases with those tasks, whether it's just a 
 sleep(1) or to spawn a subprocess and do some other work. Given it is 
 possible for the framework to specify resources for both tasks and executors, 
 and the only thing which _has_ to be a system process is the executor, the 
 mesos slave will limit the resources of the executor process to the sum of 
 (TaskInfo.Executor.Resources + TaskInfo.Resources). 
 
 Mesos also has the ability to launch new tasks on an already running 
 executor, so it's important that mesos is able to dynamically scale the 
 resource limits up and down over time. Designing a framework around this idea 
 can lead to some complex and powerful workflows which would be a lot more 
 complex to build without Mesos.
 
 Just for an example... Spark.
 
 1) User launches a job on spark to map over some data
 2) Spark launches a first wave of tasks based on the offers it received 
 (let's say T1 and T2)
 3) Mesos launches executors for those tasks (let's say E1 and E2) on 
 different slaves
 4) Spark launches another wave of tasks based on offers, and tells mesos to 
 use the same executor (E1 and E2)
 5) Mesos will simply call launchTasks(T{3,4}) on the two already running 
 executors
 
 At point (3) mesos is going to launch a Docker container and execute your 
 executor. However at (5) the executor is already running so the tasks will be 
 handed to the already running executor. 
 
 Mesos will guarantee you (i'm 99% sure) that the resources for your container 
 have been updated to reflect the limits set on the tasks before handing the 
 tasks to you.
 
 I hope that makes some sense!
 
 --
 
 Tom Arnfeld
 Developer // DueDil
 
 
 On Wed, Dec 3, 2014 at 10:54 AM, Diptanu Choudhury dipta...@gmail.com wrote:
 
 Thanks for the explanation Tom, yeah I just figured that out by reading your 
 code! You're touching the memory.soft_limit_in_bytes and 
 memory.limit_in_bytes directly.
 
 Still curios to understand in which situations Mesos Slave would call the 
 external containerizer to update the resource limits of a container? My 
 understanding was that once resource allocation happens for a task, resources 
 are not taken away until the task exits[fails, crashes or finishes] or Mesos 
 asks the slave to kill the task. 
 
 On Wed, Dec 3, 2014 at 2:47 AM, Tom Arnfeld t...@duedil.com wrote:
 Hi Diptanu,
 
 That's correct, the ECP has the responsibility of updating the resource for a 
 container, and it will do as new tasks are launched and killed for an 
 executor. Since docker doesn't support this, our containerizer (Deimos does 
 the same) goes behind docker to the cgroup for the container and updates the 
 resources in a very similar way to the mesos-slave. I believe this is also 
 what the built in Docker containerizer will do.
 
 https://github.com/duedil-ltd/mesos-docker-containerizer/blob/master/containerizer/commands/update.py#L35
 
 Tom.
 
 --
 
 Tom Arnfeld
 Developer // DueDil
 
 
 On Wed, Dec 3, 2014 at 10:45 AM, Diptanu Choudhury dipta...@gmail.com wrote:
 
 Hi,
 
 I had a quick question about the external containerizer. I see that once the 
 Task is launched, the ECP can receive the update calls, and the protobuf 
 message passed to ECP with the update call is containerizer::Update. 
 
 This protobuf has a Resources [list] field so does that mean Mesos might ask 
 a running task to re-adjust the enforced resource limits? 

Re: Proposal: shared Mesos framework hosting and registry

2014-12-01 Thread Connor Doyle
Hi Dave,

This is a timely topic, since we have been prototyping and mocking up something 
similar at Mesosphere.  We created a new public GitHub repository for it about 
three weeks ago called universe (http://github.com/mesosphere/universe).

Although we have added some informal specs, it's very malleable at this point.  
We're very much interested in making our universe compatible with, or the 
same as, the registry you're proposing.  Without delving into implementation 
details, some of the goals we have in mind are outlined below.

Data Source:

The package repository should be easily consumable by third-party command-line 
and other programs.  There should be a condensed “index” representation of the 
package repository available.

Packages within the repository should be versioned.

The package repository format itself should be versioned.

Decentralization and Composability:

The package metadata should be hosted in a public place (we like GitHub) so 
that additional packages can be added by the community by simply submitting 
pull requests.  We have added some rudimentary commit hooks and automated 
validation to protect the repo against breaking changes.

It’s important that no single entity “owns the keys” to the universe, and that 
the spec and implementation remain public.  It should be easy and free for 
organizations to maintain a private package repository.

A corollary is that it should be easy for consumers to pull from a hierarchy of 
upstream repositories.  One setup we have in mind is that an organization might 
have staging and production repositories running internally.  Packages are 
pushed to staging where integration testing can run before “deployment” to 
production.  If a package isn’t in the local repository it might be looked up 
and installed from upstream.



Repositories should be able to be proxied and cached in this way.  
Organizations should be able to isolate their datacenter but also selectively 
add external packages for experimentation. The system should be sufficiently 
portable and extensible to accomodate these and similar use cases.

Meta-Framework Descriptors:

Our conception of the package repository is a bit more expansive than just 
Mesos frameworks; it includes descriptions of how to install any piece of 
server software on a Mesos cluster.  Frameworks and non-frameworks alike may be 
installed using some other meta-framework that’s responsible for starting all 
other cluster services.  Likely candidates for this role are the long-lived 
frameworks: Aurora, Marathon, Singularity, and eventually Kubernetes.  In any 
case, the repository spec should not be prescriptive with respect to this 
choice.

The package repository metadata should make it easy for Mesos framework authors 
(and authors of non-Mesos-aware programs) to describe how to install their 
software on a Mesos cluster.  To this end, our prototype package spec allows 
for Meta-framework descriptor files for each package in the repository.  For 
example for a given package we might see a `marathon.json` file as well as a 
`my-app.aurora` file.

An obvious concern is how to specify site-specific arguments upon installation. 
 Here packages should describe data that must be marshalled from the 
environment (e.g. by prompting a user) and combined with the raw meta-framework 
descriptor to launch the app.  These configuration parameters should be 
agnostic of the supported meta-frameworks.  More concretely, in our prototype 
we describe configuration data in terms of a JSON-Schema.

CLI Integration:

Part of our proposed package format is an optional descriptor for how to fetch 
and install the command-line tools for interacting with the application.  For 
now, we only have one implementation of this, which is to fetch a python egg 
from PyPI.

Governance:

All in all, we think that making this effort more community driven is a healthy 
way to proceed.  Any input is very welcome.  For example, if others think that 
what we have is a good starting point we could transfer ownership of the 
repository to the mesos organization on GitHub.

Cheers,
--
Connor Doyle
http://mesosphere.com




 On Nov 30, 2014, at 17:32, Dave Lester daveles...@gmail.com wrote:
 
 As the number of Mesos frameworks grows (and now, a module system), I think 
 it's time to create a community-maintained registry with the goal of making 
 frameworks and modules easier to discover, contribute to, and install.
 
 There's already a JIRA ticket tracking this (MESOS-1759) and I've chatted 
 with several folks (thanks in particular Victor Vieux, Tom Arnfeld, Vinod 
 Kone, Timothy St Clair, and Joe Stein). I'd like to advance the conversation 
 by offering a proposal on the public mailing list.
 
 I imagine two initiatives to achieve this:
 
 1) Shared hosting via a GitHub org. I'm not sure if you're familiar with how 
 Jenkins maintains their plugins on GitHub [1], but they allow individual 
 plugins to have their own repo within their GH

Re: Task Checkpointing with Mesos, Marathon and Docker containers

2014-11-25 Thread Connor Doyle
Hi Geoffroy,

For the Marathon instances, in all released version of Marathon you must supply 
the --checkpoint flag to turn on task checkpointing for the framework.  We've 
changed the default to true starting with the next release.

There is a bug in Mesos where the FrameworkInfo does not get updated when a 
framework re-registers.  This means that if you shut down Marathon and restart 
it with --checkpoint, the Mesos master (with the same FrameworkId, which 
Marathon picks up from ZK) will ignore the new setting.  For reference, here is 
the design doc to address that: 
https://cwiki.apache.org/confluence/display/MESOS/Design+doc%3A+Updating+Framework+Info

Fortunately, there is an easy workaround.

1) Shut down Marathon (tasks keep running)
2) Restart the leading Mesos master (tasks keep running)
3) Start Marathon with --checkpoint enabled

This works by clearing the Mesos master's in-memory state.  It is rebuilt as 
the slave nodes and frameworks re-register.

Please report back if this doesn't solve the issue for you.
--
Connor


 On Nov 25, 2014, at 07:43, Geoffroy Jabouley geoffroy.jabou...@gmail.com 
 wrote:
 
 Hello
 
 i am currently trying to activate checkpointing for my Mesos cloud.
 
 Starting from an application running in a docker container on the cluster, 
 launched from marathon, my use cases are the followings:
 
 UC1: kill the marathon service, then restart after 2 minutes.
 Expected: the mesos task is still active, the docker container is running. 
 When the marathon service restarts, it get backs its tasks.
 
 Result: OK
 
 
 UC2: kill the mesos slave, then restart after 2 minutes.
 Expected: the mesos task remains active, the docker container is running. 
 When the mesos slave service restarts, it get backs its tasks. Marathon does 
 not show error.
 
 Results: task get status LOST when slave is killed. Docker container still 
 running.  Marathon detects the application went down and spawn a new one on 
 another available mesos slave. When the slave restarts, it kills the previous 
 running container and start a new one. So i end up with 2 applications on my 
 cluster, one spawn by Marathon, and another orphan one.
 
 
 Is this behavior normal? Can you please explain what i am doing wrong?
 
 ---
 
 Here is the configuration i have come so far:
 Mesos 0.19.1 (not dockerized)
 Marathon 0.6.1 (not dockerized)
 Docker 1.3 + Deimos 0.4.2
 
 Mesos master is started:
 /usr/local/sbin/mesos-master --zk=zk://...:2181/mesos --port=5050 
 --log_dir=/var/log/mesos --cluster=CLUSTER_POC --hostname=... --ip=... 
 --quorum=1 --work_dir=/var/lib/mesos
 
 Mesos slave is started:
 /usr/local/sbin/mesos-slave --master=zk://...:2181/mesos 
 --log_dir=/var/log/mesos --checkpoint=true 
 --containerizer_path=/usr/local/bin/deimos 
 --executor_registration_timeout=5mins --hostname=... --ip=... 
 --isolation=external --recover=reconnect --recovery_timeout=120mins 
 --strict=true
 
 Marathon is started:
 java -Xmx512m -Djava.library.path=/usr/local/lib 
 -Djava.util.logging.SimpleFormatter.format=%2$s %5$s%6$s%n -cp 
 /usr/local/bin/marathon mesosphere.marathon.Main --zk zk://...:2181/marathon 
 --master zk://...:2181/mesos --local_port_min 3 --hostname ... 
 --event_subscriber http_callback --http_port 8080 --task_launch_timeout 
 30 --local_port_max 4 --ha --checkpoint
 
 
 
 



Re: args for Docker run surrounded by quotes

2014-10-29 Thread Connor Doyle
Andrew, could you explain what you changed to make this work?

Marathon doesn't expose a `shell` argument; it's set implicitly by using either 
`cmd` or `args` in the app JSON.  `args` is what you want (sets shell to false) 
if you are using a Dockerfile with an ENTRYPOINT clause.  `args` is an array, 
and it looks like Mesos is wrapping each argument in quotes.  Did you try 
passing the arguments as separate array elements?


 args: [--master zk://...:2181/mesos --zk_hosts zk:/...:2181],

args: [--master, zk://...:2181/mesos, --zk_hosts, zk:/...:2181],

--
Connor


On Oct 29, 2014, at 9:44, Andrew Jones andrew+me...@andrew-jones.com wrote:

 Thanks a lot TIm. That worked perfectly.
 
 Thanks,
 Andrew
 
 On Wed, Oct 29, 2014, at 03:58 PM, Timothy Chen wrote:
 Hi Andrew,
 
 By default shell is enabled, which wraps your command in bin/sh and
 single quotes.
 
 Try passing shell false to marathon.
 
 Tim
 
 Sent from my iPhone
 
 On Oct 29, 2014, at 4:44 AM, Andrew Jones andrew+me...@andrew-jones.com 
 wrote:
 
 Hi,
 
 I'm trying to run a Docker image which has a defined entrypoint and pass
 args to it. It looks like when the args are passed to docker run, they
 are surrounded by single quotes.
 
 The image I am trying to run is tomaskral/chronos, and this is the
 configuration I am giving to Marathon:
 
 {
 id: chronos-test-2, 
 container: {
   docker: {
 image: tomaskral/chronos,
 network: BRIDGE,
 portMappings: [
   {
 containerPort: 8080,
 hostPort: 0,
 servicePort: 31000,
 protocol: tcp
   }
 ]
   },
   type: DOCKER,
   volumes: []
 },
 ports:[31000],
 args: [--master zk://...:2181/mesos --zk_hosts zk:/...:2181],
 cpus: 0.2,
 mem: 256.0,
 instances: 1
 }
 
 And this is an extract from the log from Mesos when the image is ran:
 
 + logged chronos run_jar '--master zk://...:2181/mesos --zk_hosts
 zk://...:2181'
 
 The argument has single quotes around it. run_jar is calling java, which
 cannot handle the quotes, and the process isn't starting.
 
 If I run the image locally with docker run like this, it works:
 
 docker run -p 8080:8080 tomaskral/chronos --master zk://...:2181/mesos
 --zk_hosts zk://...:2181
 
 But adding quotes, like this, and I get the same output as I did from
 Mesos:
 
 docker run -p 8080:8080 tomaskral/chronos '--master zk://...:2181/mesos
 --zk_hosts zk://...:2181'
 
 So I think these quotes are being added by either Marathon or Mesos when
 calling docker run, which the java command inside the container can't
 handle.
 
 Is it Mesos or Marathon adding the quotes? Is this something that should
 be fixed, or should the docker images expect this and cope?
 
 This is Mesos 0.21.1 and Marathon 0.7.3. I have also asked the author of
 the image for help (https://github.com/kadel/Dockerfiles/issues/3).
 
 Thanks,
 Andrew



Re: Docker odd behavior

2014-10-22 Thread Connor Doyle
Hi Eduardo,

There is a known defect in Mesos that matches your description:
https://issues.apache.org/jira/browse/MESOS-1915
https://issues.apache.org/jira/browse/MESOS-1884

A fix will be included in the next release.
https://reviews.apache.org/r/26486

You see the killTask because the default --task_launch_timeout value for 
Marathon is 60 seconds.
Created an issue to make the logging around this better:
https://github.com/mesosphere/marathon/issues/732

--
Connor


On Oct 22, 2014, at 16:18, Eduardo Jiménez yoeduar...@gmail.com wrote:

 Hi,
 
 I've started experimenting with mesos using the docker containerizer, and 
 running a simple example got into a very strange state.
 
 I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux:
 
 Linux ip 3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC 2014 
 x86_64 x86_64 x86_64 GNU/Linux
 
 Docker version 1.2.0, build fa7b24f/1.2.0
 
 I start the mesos slave with these relevant options:
 
 --cgroups_hierarchy=/cgroup
 --containerizers=docker,mesos
 --executor_registration_timeout=5mins
 --isolation=cgroups/cpu,cgroups/mem
 
 I launched a very simple app, which is from the mesosphere examples:
 
 {
   container: {
 type: DOCKER,
 docker: {
   image: libmesos/ubuntu
 }
   },
   id: ubuntu-docker2,
   instances: 1,
   cpus: 0.5,
   mem: 512,
   uris: [],
   cmd: while sleep 10; do date -u +%T; done
 }
 
 The app launches, but then mesos states the task is KILLED, yet the docker 
 container is STILL running. Here's the sequence of logs from that mesos-slave.
 
 1) Task gets created and assigned:
 
 I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework 
 20141017-172055-3489660938-5050-1603-
 I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework 
 20141017-172055-3489660938-5050-1603-
 I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task 
 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' for executor 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
 '20141017-172055-3489660938-5050-1603-
 I1022 17:44:13.989893 15195 docker.cpp:743] Starting container 
 'c1fc27c8-13e9-484f-a30c-cb062ec4c978' for task 
 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor 
 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799') of framework 
 '20141017-172055-3489660938-5050-1603-'
 
 So far so good. The log statements right next to Starting container is:
 
 I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
 20141017-172055-3489660938-5050-1603-
 I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update 
 TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
 20141017-172055-3489660938-5050-1603- from @0.0.0.0:0
 W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered executor 
 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' of framework 
 20141017-172055-3489660938-5050-1603- because it has no tasks
 E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources for 
 container c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 running task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for 
 terminal task, destroying container: No container found
 
 After this, there's several log messages like this:
 
 I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received status 
 update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
 20141017-172055-3489660938-5050-1603-
 I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding status 
 update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
 20141017-172055-3489660938-5050-1603- to master@10.0.0.147:5050
 W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending status 
 update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
 20141017-172055-3489660938-5050-1603-
 I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding status 
 update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
 20141017-172055-3489660938-5050-1603- to master@10.0.0.147:5050
 
 
 Eventually the TASK_KILLED update is acked and the Mesos UI shows the task as 
 killed. By then, the process should be dead, but its not.
 
 $ sudo docker ps
 CONTAINER IDIMAGECOMMANDCREATED   
   STATUS  PORTS

Re: Reconciliation Document

2014-10-16 Thread Connor Doyle
Thanks for writing this up Ben! I have a couple suggestions about additional 
details that could be helpful to explain.

First, could you go a little more in-depth about how this process works for 
terminated tasks? For example, how does reconciliation behave for tasks running 
on a slave that has become disconnected from the master? An overview of the 
various timeouts involved would also be really awesome.

Second, what happens when a framework attempts to reconcile a task that is 
completely unknown to Mesos? An example scenario could be that a task died, the 
terminal status update was ACKed, but the scheduler failed over before this 
information could be persisted. What task status (if any) does Mesos respond 
with?
--
Connor Doyle
http://mesosphere.io


On Oct 15, 2014, at 14:05, Benjamin Mahler benjamin.mah...@gmail.com wrote:

 Hi all,
 
 I've sent a review out for a document describing reconciliation, you can see 
 the draft here:
 https://gist.github.com/bmahler/18409fc4f052df43f403
 
 Would love to gather high level feedback on it from framework developers. 
 Feel free to reply here, or on the review:
 https://reviews.apache.org/r/26669/
 
 Thanks!
 Ben



Re: Mesos Docker design question

2014-10-15 Thread Connor Doyle
Andy, passing the sidekick container ID is one issue.  But aside
from that, if you have written a custom framework what's to stop you
from waiting for a resource offer that accommodates both containers
you want to schedule and then submitting two TaskInfos in the same
call to SchedulerDriver.launchTasks?
--
Connor

On Wed, Oct 15, 2014 at 5:36 PM, Tim Chen t...@mesosphere.io wrote:
 Hi Andy,

 I've definitely been seeing similar use cases popping up, and you're right
 that nothing in Mesos right out of the box has any support for co-locating
 tasks for you.

 For your potential solution, I don't see why you will need the container
 name or ID? TaskStatus also has slaveId so you do know which slave you want
 to launch your second task on. You will need to keep a mapping yourself that
 for your given TaskId you can now launch your 2nd task locating in the same
 slave id once you have a offer from that slave.

 And yes your concern is correct about can not always gurantee you can either
 launch or when to launch your 2nd docker task.

 I believe we will be thinking about more how to launch a collection of tasks
 co-located together, and what that looks like in the near future. If you
 have more requirements and thoughts of how to do so please share them as
 well.

 Thanks!

 Tim

 On Tue, Oct 14, 2014 at 4:57 PM, Andy Grove andy.gr...@codefutures.com
 wrote:

 We've made good progress deploying our product with Mesos but feel like we
 may need to move away from using the mesos docker executor and roll our own
 but at the same time I am wondering if I am just looking at the problem in
 the wrong way, not having that much experience with mesos.

 The issue is that as well as being able to launch a docker container on a
 slave, we also then want to be able to get information about the container
 once it starts (like its ID or IP address) and write that information to
 zookeeper.

 Our current approach is:

 1. Scheduler asks mesos to execute container (e.g. use mesos docker
 support to issue the docker run command)
 2. Have some code inside the container that gets the containers IP address
 on startup and writes it to zookeeper

 This works but the downside is each container/image must have this extra
 step added.

 There is a potential way of doing this in mesos instead but there are some
 pieces missing:

 1. Scheduler asks mesos to execute container (e.g. use mesos docker
 support to issue the docker run command)
 2. Scheduler receives statusUpdate() saying that the task is running (but
 we don't know the container ID or container name)
 3. Scheduler requests that the same slave now runs another task (custom
 code in our product) that will get the container details and register them
 with ZK

 There is no way for the scheduler to know the container ID which means we
 can't schedule the follow up task.

 Even if we could do this, my concern would then be that step 3 might fail
 if the slave no longer has spare resource.

 I'd appreciate any feedback on best practices to achieve this.

 Thanks,

 Andy.

 --
 Andy Grove
 VP Engineering
 CodeFutures Corporation






-- 
connor


Re: Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Connor Doyle
It doesn't appear to be related to the registration timeout; based on the logs 
the time between task launch and kill was only about 4.3 seconds.
--
Connor

 On Oct 2, 2014, at 14:24, Dick Davies d...@hellooperator.net wrote:
 
 One thing to check - have you upped
 
 --executor_registration_timeout
 
 from the default of 1min? a docker pull can easily take longer than that.
 
 On 2 October 2014 22:18, Michael Babineau michael.babin...@gmail.com wrote:
 I'm seeing an issue where tasks are being marked as killed but remain
 running. The tasks all run via the native Docker containerizer and are
 started from Marathon.
 
 The net result is additional, orphaned Docker containers that must be
 stopped/removed manually.
 
 Versions:
 - Mesos 0.20.1
 - Marathon 0.7.1
 - Docker 1.2.0
 - Ubuntu 14.04
 
 Environment:
 - 3 ZK nodes, 3 Mesos Masters, and 3 Mesos Slaves (all separate instances)
 on EC2
 
 Here's the task in the Mesos UI:
 
 (note that stderr continues to update with the latest container output)
 
 Here's the still-running Docker container:
 $ docker ps|grep 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f
 3d451b8213ea
 docker.thefactory.com/ace-serialization:f7aa1d4f46f72d52f5a20ef7ae8680e4acf88bc0
 \/bin/sh -c 'java26 minutes ago  Up 26 minutes   9990/tcp
 mesos-1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f
 
 Here are the Mesos logs associated with the task:
 $ grep eda431d7-4a74-11e4-a320-56847afe9799 /var/log/mesos/mesos-slave.INFO
 I1002 20:44:39.176024  1528 slave.cpp:1002] Got assigned task
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework
 20140919-224934-1593967114-5050-1518-
 I1002 20:44:39.176257  1528 slave.cpp:1112] Launching task
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework
 20140919-224934-1593967114-5050-1518-
 I1002 20:44:39.177287  1528 slave.cpp:1222] Queuing task
 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' for executor
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
 '20140919-224934-1593967114-5050-1518-
 I1002 20:44:39.191769  1528 docker.cpp:743] Starting container
 '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f' for task
 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' (and executor
 'serialization.eda431d7-4a74-11e4-a320-56847afe9799') of framework
 '20140919-224934-1593967114-5050-1518-'
 I1002 20:44:43.707033  1521 slave.cpp:1278] Asked to kill task
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
 20140919-224934-1593967114-5050-1518-
 I1002 20:44:43.707811  1521 slave.cpp:2088] Handling status update
 TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
 20140919-224934-1593967114-5050-1518- from @0.0.0.0:0
 W1002 20:44:43.708273  1521 slave.cpp:1354] Killing the unregistered
 executor 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
 20140919-224934-1593967114-5050-1518- because it has no tasks
 E1002 20:44:43.708375  1521 slave.cpp:2205] Failed to update resources for
 container 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f of executor
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 running task
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 on status update for
 terminal task, destroying container: No container found
 I1002 20:44:43.708524  1521 status_update_manager.cpp:320] Received status
 update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
 20140919-224934-1593967114-5050-1518-
 I1002 20:44:43.708709  1521 status_update_manager.cpp:373] Forwarding status
 update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
 20140919-224934-1593967114-5050-1518- to master@10.2.0.182:5050
 I1002 20:44:43.728991  1526 status_update_manager.cpp:398] Received status
 update acknowledgement (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
 serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
 20140919-224934-1593967114-5050-1518-
 I1002 20:47:05.904324  1527 slave.cpp:2538] Monitoring executor
 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
 '20140919-224934-1593967114-5050-1518-' in container
 '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f'
 I1002 20:47:06.311027  1525 slave.cpp:1733] Got registration for executor
 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
 20140919-224934-1593967114-5050-1518- from executor(1)@10.2.1.34:29920
 
 I'll typically see a barrage of these in association with a Marathon app
 update (which deploys new tasks). Eventually, one container sticks and we
 get a RUNNING task instead of a KILLED one.
 
 Where else can I look?


Re: Docker Example Mesos 0.20?

2014-08-27 Thread Connor Doyle
Hi Eran, that's correct. Mesos supports multiple containerizers now.  The order 
they are listed is significant; as listed the Docker containerizer will pass on 
the TaskInfo if the ContainerInfo is not set or if the container type is not 
DOCKER.
--
Connor

 On Aug 27, 2014, at 9:09, Eran Chinthaka Withana eran.chinth...@gmail.com 
 wrote:
 
 Thanks Frank for these instructions. I will have to wait for marathon release 
 to use this (hopefully that will happen soon)
 
 A n00b question from me here. I noticed that we can now set 
 --containerizers=docker,mesos. Does this mean mesos slaves will now support 
 both docker type and old containers? If we don't mention container section 
 in the marathon request[1], will it work using standard lxc?
 
 
 {
 
   container: {
 
 type: DOCKER,
 
 docker: {
 
   image: libmesos/ubuntu
 
 }
 
   },
 
   id: ubuntu,
 
   instances: 1,
 
   cpus: 0.5,
 
   mem: 128,
 
   uris: [],
 
   cmd: while sleep 10; do date -u +%T; done
 
 }
 
 Thanks,
 Eran Chinthaka Withana
 
 
 On Tue, Aug 26, 2014 at 11:06 PM, Frank Hinek frank.hi...@gmail.com wrote:
 Working here as well.  Thanks for the assist Tim!
 
 Put together a post on the steps for my own reference: 
 http://frankhinek.com/deploy-docker-containers-on-mesos-0-20/
 
 
 On August 26, 2014 at 4:39:38 PM, Ray Rodriguez (rayrod2...@gmail.com) 
 wrote:
 
 Thanks Tim works great.  Cheers!
 
 
 On Tue, Aug 26, 2014 at 4:31 PM, Tim Chen t...@mesosphere.io wrote:
 Hi Ray,
 
 Sorry the tutorial is not yet up to date too, once we have Marathon 0.7 
 released the tutorial will be updated as well.
 
 Here is one example for running the image:
 
 {
 
 id: inky, 
 
 container: {
 
 docker: {
 
 image: mesosphere/inky
 
 },
 
 type: DOCKER,
 
 volumes: []
 
 },
 
 args: [hello],
 
 cpus: 0.2,
 
 mem: 32.0,
 
 instances: 1
 
 }
 
 
 
 You can also provide a cmd string as well.
 
 
 
 Tim
 
 
 
 On Tue, Aug 26, 2014 at 11:28 AM, Ray Rodriguez rayrod2...@gmail.com 
 wrote:
 I'm running marathon HEAD 0.7.0 against mesos 0.20.0.
 
 My mesos slaves are running with the command line flag 
 --containerizers=docker,mesos and --isolation=cgroups/cpu,cgroups/mem
 
 When trying to run the example listed here: 
 http://mesosphere.io/learn/run-docker-on-mesosphere-cluster/ I get the 
 following in the sandbox stderr/stdout
 
 stdout:
 
 Shutting down
 
 stderr:
 
 I0826 18:12:48.983397 28817 exec.cpp:132] Version: 0.20.0 I0826 
 18:12:48.985131 28843 exec.cpp:379] Executor asked to shutdown
 
 
 
 
 On Tue, Aug 26, 2014 at 2:15 PM, Frank Hinek frank.hi...@gmail.com 
 wrote:
 Thanks for the tip!  Building Marathon from latest master at the moment 
 to test.
 
 
 
 
 On August 26, 2014 at 1:47:20 PM, Tim Chen (t...@mesosphere.io) wrote:
 
 Hi Frank,
 
 Yes you need Marathon 0.7 which we are working on to release soon.
 
 In the mean time if you want you can grab latest master to experiment 
 with.
 
 Thanks!
 
 Tim
 
 
 On Tue, Aug 26, 2014 at 10:41 AM, Frank Hinek frank.hi...@gmail.com 
 wrote:
 I did run through that example but it fails every time.  Perhaps it is 
 because Marathon 0.6.1 doesn’t yet support the new capabilities in 
 Mesos 0.20.0.
 
 curl -X POST -H Content-Type: application/json 
 http://127.0.0.1:8080/v2/apps -d...@docker.json
 nullvagrant@vagrant-ubuntu-trusty-64:/tmp$ I0826 17:23:25.071254  1742 
 slave.cpp:1002] Got assigned task 
 ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 for framework 
 20140826-170643-251789322-5050-1532-
 I0826 17:23:25.072319  1742 slave.cpp:1112] Launching task 
 ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 for framework 
 20140826-170643-251789322-5050-1532-
 I0826 17:23:25.073552  1736 docker.cpp:782] No container info found, 
 skipping launch
 I0826 17:23:25.074030  1742 slave.cpp:1222] Queuing task 
 'ubuntu.afa18986-2d45-11e4-8e47-56847afe9799' for executor 
 ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 of framework 
 '20140826-170643-251789322-5050-1532-
 E0826 17:23:25.074518  1742 slave.cpp:2491] Container 
 '01966efd-f521-4f54-87e4-f84aa9adcfa9' for executor 
 'ubuntu.afa18986-2d45-11e4-8e47-56847afe9799' of framework 
 '20140826-170643-251789322-5050-1532-' failed to start: 
 TaskInfo/ExecutorInfo not supported
 E0826 17:23:25.074937  1742 slave.cpp:2577] Termination of executor 
 'ubuntu.afa18986-2d45-11e4-8e47-56847afe9799' of framework 
 '20140826-170643-251789322-5050-1532-' failed: No container found
 E0826 17:23:25.075564  1742 slave.cpp:2863] Failed to unmonitor 
 container for executor ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 of 
 framework 20140826-170643-251789322-5050-1532-: Not monitored
 I0826 17:23:25.076370  1742 slave.cpp:2087] Handling status update 
 TASK_FAILED (UUID: 0da7c07d-aeb3-4aa3-a457-0dfcf0243914) for task 
 ubuntu.afa18986-2d45-11e4-8e47-56847afe9799 of framework 
 20140826-170643-251789322-5050-1532- from @0.0.0.0:0
 E0826 17:23:25.076938  1742 

Re: Alternate HDFS Filesystems + Hadoop on Mesos

2014-08-24 Thread Connor Doyle

 Also, fwiw I'm interested in rallying folks on a Tachyon Framework in the 
 not-too-distant future, for anyone who is interested.  Probably follow the 
 spark model and try to push upstream.   

Hi Tim, late follow-up:

The not-too distant future is here!  Adam and I took a stab at a Tachyon 
framework during the MesosCon hackathon 
(http://github.com/mesosphere/tachyon-mesos).
We started writing in Scala, but not at all opposed to switching to Java, 
especially if the work can be upstreamed.
--
Connor

 
 
 On Fri, Aug 15, 2014 at 5:16 PM, John Omernik j...@omernik.com wrote:
 I tried hdfs:/// and hdfs://cldbnode:7222/ Neither worked (examples below) I 
 really think the hdfs vs other prefixes should be looked at. Like I said 
 above, the tachyon project just added a env variable to address this.  
 
 
 
 hdfs://cldbnode:7222/
 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0815 19:14:17.101666 22022 fetcher.cpp:76] Fetching URI 
 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
 I0815 19:14:17.101780 22022 fetcher.cpp:105] Downloading resource from 
 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
 E0815 19:14:17.778833 22022 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
 use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
 files.
 -copyToLocal: Wrong FS: 
 maprfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: 
 hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
 src ... localdst
 Failed to fetch: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Failed to synchronize with slave (it's probably exited)
 
 
 
 hdfs:/// 
 
 
 I0815 19:10:45.006803 21508 fetcher.cpp:76] Fetching URI 
 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
 I0815 19:10:45.007099 21508 fetcher.cpp:105] Downloading resource from 
 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
 E0815 19:10:45.681922 21508 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
 use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
 files.
 -copyToLocal: Wrong FS: maprfs:/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, 
 expected: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
 src ... localdst
 Failed to fetch: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Failed to synchronize with slave (it's probably exited)
 
 
 On Fri, Aug 15, 2014 at 5:38 PM, John Omernik j...@omernik.com wrote:
 I am away from my cluster right now, I trued doing a hadoop fs -ls 
 maprfs:// and that worked.   When I tries hadoop fs -ls hdfs:/// it failed 
 with wrong fs type.  With that error I didn't try it in the mapred-site.  I 
 will try it.  Still...why hard code the file prefixes? I guess I am curious 
 on how glusterfs would work, or others as they pop up. 
 
 On Aug 15, 2014 5:04 PM, Adam Bordelon a...@mesosphere.io wrote:
 Can't you just use the hdfs:// protocol for maprfs? That should work just 
 fine.
 
 
 On Fri, Aug 15, 2014 at 2:50 PM, John Omernik j...@omernik.com wrote:
 Thanks all.
 
 I realized MapR has a work around for me that I will try soon in that I 
 have MapR fs NFS mounted on each node, I.e. I should be able to get the 
 tar from there.
 
 That said, perhaps someone with better coding skills than me could 
 provide an env variable where a user could provide the HDFS prefixes to 
 try. I know we did that with the tachyon project and it works well for 
 other HDFS compatible fs implementations, perhaps that would work here?  
 Hard coding a pluggable system seems like a long term issue that will 
 keep coming up.
 
 On Aug 15, 2014 4:02 PM, Tim St Clair tstcl...@redhat.com wrote:
 The uri doesn't currently start with any of the known types (at least 

Re: Service Discovery with Marathon and HAProxy

2014-08-20 Thread Connor Doyle
Thanks for sharing Bart!
Will definitely take this for a spin.
--
Connor

 On Aug 19, 2014, at 8:09, Bart Spaans bart.spa...@opencredo.com wrote:
 
 Hi everyone, 
 
 I've just released a project that might be of interest to some of you. 
 
 It can be used to automatically reload HAProxy configurations when something 
 in Marathon changes so that traffic is proxied to the right ports on the 
 right hosts as soon as tasks are started and stopped. 
 
 It beats a solution in cron or DNS because the changes are instant and 
 downtime is minimised. 
 
 The project is in a fairly infant stage, but hopefully already useful to some 
 -- pull requests are welcome though!
 
 https://github.com/opencredo/mesos_service_discovery
 
 Kind regards, 
 Bart Spaans 
 
 @Work_of_Bart