Re: Mesos on RHEL 5.X

2016-04-19 Thread Jason Giedymin
Additional problems exist with containers as well. 

You could also hand roll a kernel upgrade which I've done for some others and 
this seems to work. However I always make it clear that what you get is 
"as-is". Kind of like a used car.

-Jason

> On Apr 19, 2016, at 2:23 PM, Jeff Schroeder  
> wrote:
> 
> Much of the functionality in mesos won't work such as anything using the 
> newer isolation modes using control groups. An example would be:
> http://mesos.apache.org/documentation/latest/network-monitoring/
> 
> I guess it is fraught with dragons. Good luck!
> 
>> On Tuesday, April 19, 2016, Manivannan  wrote:
>> Hi Jeff,
>> 
>> Thanks for your reply. We were running mesos 0.18.0 on the boxes with RHEL 
>> 5.4 for a long time. 
>> We cannot upgrade right away to a newer version of operating system due to 
>> several reasons. 
>> 
>> Hi Jie,
>> 
>> Thanks for your reply. 
>> Do  you know  if I can install devtools-2 for RHEL 5.4 using yum ? 
>> 
>> Thanks,
>> Mani
>> 
>>> On Tue, Apr 19, 2016 at 10:54 AM, Jie Yu  wrote:
>>> I know someone is still using Mesos in production on RHEL 5.4. You need 
>>> devtoolset-2 to build Mesos.
>>> 
 On Tue, Apr 19, 2016 at 10:50 AM, Jeff Schroeder 
  wrote:
 The RHEL5 kernel will not support the necessary bits for mesos. RHEL6 also 
 lacks the overwhelming majority of support for namespaces and control 
 groups. Try upgrading to RHEL7 and then giving Mesos a go. It doesn't 
 support older kernels.
 
 
> On Tuesday, April 19, 2016, Manivannan  wrote:
> Hi,
> 
> When I searched for Mesos rpm for RHEL 5.4, I could not find one. rpms 
> are available only for RHEL 6 and 7. 
> 
> Is there an Mesos 0.28.0 rpm for RHEL 5.4 ? If not has anyone compiled 
> Mesos 0.28.0 on RHEL 5.X successfully ? 
> 
> 
> Thanks in advance,
> Mani
 
 
 -- 
 Text by Jeff, typos by iPhone
> 
> 
> -- 
> Text by Jeff, typos by iPhone


Re: AW: Feature request: move in-flight containers w/o stopping them

2016-02-19 Thread Jason Giedymin
Food for thought:

One should refrain from monolithic apps. If they're small and stateless you 
should be doing rolling upgrades. 

If you find yourself with one container and you can't easily distribute that 
work load by just scaling and load balancing then you have a monolith. Time to 
enhance it.

Containers should not be treated like VMs.

-Jason

> On Feb 19, 2016, at 6:05 AM, Mike Michel  wrote:
> 
> Question is if you really need this when you are moving in the world of 
> containers/microservices where it is about building stateless 12factor apps 
> except databases. Why moving a service when you can just kill it and let the 
> work be done by 10 other containers doing the same? I remember a talk on 
> dockercon about containers and live migration. It was like: „And now where 
> you know how to do it, dont’t do it!“
>  
> Von: Avinash Sridharan [mailto:avin...@mesosphere.io] 
> Gesendet: Freitag, 19. Februar 2016 05:48
> An: user@mesos.apache.org
> Betreff: Re: Feature request: move in-flight containers w/o stopping them
>  
> One problem with implementing something like vMotion for Mesos is to address 
> seamless movement of network connectivity as well. This effectively requires 
> moving the IP address of the container across hosts. If the container shares 
> host network stack, this won't be possible since this would imply moving the 
> host IP address from one host to another. When a container has its network 
> namespace, attached to the host, using a bridge, moving across L2 segments 
> might be a possibility. To move across L3 segments you will need some form of 
> overlay (VxLAN maybe ?) . 
>  
> On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor  wrote:
> Is this theoretically feasible with Linux checkpoint and restore, perhaps via 
> CRIU?http://criu.org/Main_Page
> 
> On Feb 18, 2016, at 4:35 AM, Paul Bell  wrote:
> 
> Hello All,
>  
> Has there ever been any consideration of the ability to move in-flight 
> containers from one Mesos host node to another?
>  
> I see this as analogous to VMware's "vMotion" facility wherein VMs can be 
> moved from one ESXi host to another.
>  
> I suppose something like this could be useful from a load-balancing 
> perspective.
>  
> Just curious if it's ever been considered and if so - and rejected - why 
> rejected?
>  
> Thanks.
>  
> -Paul
>  
>  
> 
> 
>  
> --
> Avinash Sridharan, Mesosphere
> +1 (323) 702 5245


Re: docker based executor

2015-04-17 Thread Jason Giedymin
What is the last command you have docker doing?

If that command exits then the docker will begin to end the container.

-Jason

 On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote:
 
 Hi -
 I am looking at revving the mesos-storm framework to be dockerized (and 
 simpler). 
 I’m using mesos 0.22.0-1.0.ubuntu1404
 mesos master + mesos slave are deployed in docker containers, in case it 
 matters. 
 
 I have the storm (nimbus) framework launching fine as a docker container, but 
 launching tasks for a topology is having problems related to using a 
 docker-based executor.
 
 For example. 
 
 TaskInfo task = TaskInfo.newBuilder()
.setName(worker  + slot.getNodeId() + : + slot.getPort())
.setTaskId(taskId)
.setSlaveId(offer.getSlaveId())
.setExecutor(ExecutorInfo.newBuilder()

 .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
.setData(ByteString.copyFromUtf8(executorDataStr))
.setContainer(ContainerInfo.newBuilder()
.setType(ContainerInfo.Type.DOCKER)
.setDocker(ContainerInfo.DockerInfo.newBuilder()
.setImage(mesos-storm”)))

 .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm 
 supervisor storm.mesos.MesosSupervisor))
//rest is unchanged from existing mesos-storm framework code
 
 The executor launches and exits quickly - see the log msg:  Executor for 
 container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
 
 It seems like mesos loses track of the executor? I understand there is a 1 
 min timeout on registering the executor, but the exit happens well before 1 
 minute.
 
 I tried a few alternate commands to experiment, and I can see in the stdout 
 for the task that
 echo testing123  echo testing456” 
 prints to stdout correctly, both testing123 and testing456
 
 however:
 echo testing123a  sleep 10  echo testing456a” 
 prints only testing123a, presumably because the container is lost and 
 destroyed before the sleep time is up.
 
 So it’s like the container for the executor is only allowed to run for .5 
 seconds, then it is detected as exited, and the task is lost. 
 
 Thanks for any advice.
 
 Tyson
 
 
 
 slave logs look like:
 mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task 
 mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
 mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching 
 executor insights-1-1429297638 of framework 
 20150417-190611-2801799596-5050-1- in work directory 
 '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
 framework '20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting 
 container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 
 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-'
 mesosslave_1  | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for 
 container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
 mesosslave_1  | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying 
 container '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring 
 executor 'insights-1-1429297638' of framework 
 '20150417-190611-2801799596-5050-1-' in container 
 '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker 
 stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:28.28676110 slave.cpp:3186] Executor 
 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- 
 has terminated with unknown status
 mesosslave_1  | I0417 19:07:28.28878410 slave.cpp:2508] Handling status 
 update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task 
 mesos-slave1.service.consul-31000 of framework 
 20150417-190611-2801799596-5050-1- from @0.0.0.0:0
 mesosslave_1  | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating 
 unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3
 
 nimbus logs (framework) look like:
 2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: 
 task_id {
  value: mesos-slave1.service.consul-31000
 }
 state: TASK_LOST
 message: Container terminated
 slave_id {
  value: 20150417-190611-2801799596-5050-1-S0
 }
 timestamp: 1.429297648286981E9
 source: SOURCE_SLAVE
 reason: 

Re: docker based executor

2015-04-17 Thread Jason Giedymin
Try: 

until something; do
  echo waiting for something to do something
  sleep 5
done

You can put this in a bash file and run that.

If you have a dockerfile would be easier to debug.

-Jason

 On Apr 17, 2015, at 4:24 PM, Tyson Norris tnor...@adobe.com wrote:
 
 Yes, agreed that the command should not exit - but the container is killed at 
 around 0.5 s after launch regardless of whether the command terminates, which 
 is why I’ve been experimenting using commands with varied exit times. 
 
 For example, forget about the executor needing to register momentarily.
 
 Using the command:
 echo testing123c  sleep 0.1  echo testing456c
 - I see the expected output in stdout, and the container is destroyed (as 
 expected), because the container exits quickly, and then is destroyed
 
 Using the command:
 echo testing123d  sleep 0.6  echo testing456d
 - I do NOT see the expected output in stdout (I only get testing123d), 
 because the container is destroyed prematurely after ~0.5 seconds
 
 Using the “real” storm command, I get no output in stdout, probably because 
 no output is generated within 0.5 seconds of launch - it is a bit of a pig to 
 startup, so I’m currently just trying to execute some other commands for 
 testing purposes.
 
 So I’m guessing this is a timeout issue, or else that the container is reaped 
 inappropriately, or something else… looking through this code, I’m trying to 
 figure out the steps take during executor launch:
 https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715
 
 Thanks
 Tyson
   
 
 
 
 
 On Apr 17, 2015, at 12:53 PM, Jason Giedymin jason.giedy...@gmail.com 
 wrote:
 
 What is the last command you have docker doing?
 
 If that command exits then the docker will begin to end the container.
 
 -Jason
 
 On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote:
 
 Hi -
 I am looking at revving the mesos-storm framework to be dockerized (and 
 simpler). 
 I’m using mesos 0.22.0-1.0.ubuntu1404
 mesos master + mesos slave are deployed in docker containers, in case it 
 matters. 
 
 I have the storm (nimbus) framework launching fine as a docker container, 
 but launching tasks for a topology is having problems related to using a 
 docker-based executor.
 
 For example. 
 
 TaskInfo task = TaskInfo.newBuilder()
   .setName(worker  + slot.getNodeId() + : + slot.getPort())
   .setTaskId(taskId)
   .setSlaveId(offer.getSlaveId())
   .setExecutor(ExecutorInfo.newBuilder()
   
 .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
   .setData(ByteString.copyFromUtf8(executorDataStr))
   .setContainer(ContainerInfo.newBuilder()
   .setType(ContainerInfo.Type.DOCKER)
   .setDocker(ContainerInfo.DockerInfo.newBuilder()
   .setImage(mesos-storm”)))
   
 .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm 
 supervisor storm.mesos.MesosSupervisor))
   //rest is unchanged from existing mesos-storm framework code
 
 The executor launches and exits quickly - see the log msg:  Executor for 
 container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
 
 It seems like mesos loses track of the executor? I understand there is a 1 
 min timeout on registering the executor, but the exit happens well before 1 
 minute.
 
 I tried a few alternate commands to experiment, and I can see in the stdout 
 for the task that
 echo testing123  echo testing456” 
 prints to stdout correctly, both testing123 and testing456
 
 however:
 echo testing123a  sleep 10  echo testing456a” 
 prints only testing123a, presumably because the container is lost and 
 destroyed before the sleep time is up.
 
 So it’s like the container for the executor is only allowed to run for .5 
 seconds, then it is detected as exited, and the task is lost. 
 
 Thanks for any advice.
 
 Tyson
 
 
 
 slave logs look like:
 mesosslave_1  | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned 
 task mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46147911 slave.cpp:1231] Launching task 
 mesos-slave1.service.consul-31000 for framework 
 20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.46325011 slave.cpp:4160] Launching 
 executor insights-1-1429297638 of framework 
 20150417-190611-2801799596-5050-1- in work directory 
 '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
 mesosslave_1  | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 
 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of 
 framework '20150417-190611-2801799596-5050-1-
 mesosslave_1  | I0417 19:07:27.467200 7 docker.cpp:755] Starting 
 container

Re: docker based executor

2015-04-17 Thread Jason Giedymin
What do any/all logs say? (syslog)

-Jason

 On Apr 17, 2015, at 7:22 PM, Tyson Norris tnor...@adobe.com wrote:
 
 another interesting fact:
 I can restart the docker container of my executor, and it runs great. 
 
 In the test example below, notice the stdout appears to be growing as 
 expected after restarting the container.
 
 So something is killing my executor container (also indicated by the Exited 
 (137) About a minute ago”), but I’m still not sure what.
 
 Thanks
 Tyson
 
 
 
 tnorris-osx:insights tnorris$ docker ps -a | grep testexec
 5291fe29c9c2testexecutor:latest   
 /bin/sh -c executor   About a minute ago   Exited (137) 
 About a minute ago

 mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9   
 tnorris-osx:insights tnorris$ docker start 
 mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9
 mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9
 tnorris-osx:insights tnorris$ docker logs 
 mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 waiting for something to do something
 tnorris-osx:insights tnorris$ docker stop 
 mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9
 
 
 On Apr 17, 2015, at 2:11 PM, Tyson Norris tnor...@adobe.com wrote:
 
 You can reproduce with most any dockerfile, I think - it seems like 
 launching a customer executor that is a docker container has some problem. 
 
 I just made a simple test with docker file:
 --
 #this is oracle java8 atop phusion baseimage
 FROM opentable/baseimage-java8:latest
 
 
 #mesos lib (not used here, but will be in our “real” executor, e.g. to 
 register the executor etc)
 RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF
 RUN echo deb http://repos.mesosphere.io/$(lsb_release -is | tr '[:upper:]' 
 '[:lower:]') $(lsb_release -cs) main | tee 
 /etc/apt/sources.list.d/mesosphere.list
 RUN cat /etc/apt/sources.list.d/mesosphere.list
 RUN apt-get update  apt-get install -y \
 mesos
 
 ADD script.sh /usr/bin/executor-script.sh
 
 CMD executor-script.sh
 --
 
 and script.sh:
 --
 #!/bin/bash
 until false; do
   echo waiting for something to do something
   sleep 0.2
 done
 --
 
 And in my stdout I get exactly 2 lines:
 waiting for something to do something
 waiting for something to do something
 
 Which is how many lines can be output in within 0.5 seconds…something is 
 fishy about the 0.5 seconds, but I’m not sure where.
 
 I’m not sure exactly the difference, but launching a docker container as a 
 task WITHOUT a custom executor works fine, and I’m not sure about launching 
 a docker container as a task that is using a non-docker custom executor. The 
 case I’m trying for is using a docker customer executor, and launching 
 non-docker tasks. (in case that helps clarify the situation).
 
 Thanks
 Tyson
 
 
 
 
 
 On Apr 17, 2015, at 1:47 PM, Jason Giedymin jason.giedy...@gmail.com 
 wrote:
 
 Try: 
 
 until something; do
   echo waiting for something to do something
   sleep 5
 done
 
 You can put this in a bash file and run that.
 
 If you have a dockerfile would be easier to debug.
 
 -Jason
 
 On Apr 17, 2015, at 4:24 PM, Tyson Norris tnor...@adobe.com wrote:
 
 Yes, agreed that the command should not exit - but the container is killed 
 at around 0.5 s after launch regardless of whether the command terminates, 
 which is why I’ve been experimenting using commands with varied exit 
 times. 
 
 For example, forget about the executor needing to register momentarily.
 
 Using the command:
 echo testing123c  sleep 0.1  echo testing456c
 - I see the expected output in stdout, and the container is destroyed (as 
 expected), because the container exits quickly, and then is destroyed
 
 Using the command:
 echo testing123d  sleep 0.6  echo testing456d
 - I do NOT see the expected output in stdout (I only get testing123d), 
 because the container is destroyed

Re: mesos and coreos?

2015-01-18 Thread Jason Giedymin
Coreos places focus on the OS to deploy services as containers. It’s 
distributed key store is meant to share config in a cluster and to aid in basic 
scheduling via fleet, which is like cluster wide systemd.

It’s scheduler is basic (but can be made to be more complex if you were to use 
these base tools). On the other hand, Mesos has a more complex featureful 
scheduler, works as-an application, and has more first class controls over 
managing jobs (cgroups, etc…)

There is not complete overlap between these two systems. They do not 
necessarily compete with each other. But they do have features which try to 
address  distributed application design/deployment.

- J

 On Jan 18, 2015, at 1:29 PM, Victor L vlyamt...@gmail.com wrote:
 
 Hope this helps some
 It doesn't as it doesn't even try to answer my question. Let me re- phrase 
 it: what does mesos on the coreos cluster do that coreos itself doesn't do 
 already? 
 
 On Sun, Jan 18, 2015 at 10:00 AM, Jason Giedymin jason.giedy...@gmail.com 
 mailto:jason.giedy...@gmail.com wrote:
 The value of coreos that immediately comes to mind since I do much work with 
 these tools:
 
  - the small foot print, it is a minimal os, meant to run containers. So it 
 throws everything not needed for that out.
  - containers are the launch vehicle, thus deps are in container land. I can 
 run and test containers with ease, not having to worry about multiple OSes.
  - with etcd and fleet, coordinating the launch and modification of both 
 machines and cluster make it a breeze. Allowing you to do dynamic mesos 
 scaling up or down. I add nodes at will, across multiple cloud platforms, 
 ready to launch multitude of containers or just mesos.
  - security. There is a defined write strategy. You cannot write willy nilly 
 to any location.
  - all the above further allow auto OS updates, which is supported today on 
 all platforms that deploy coreos. This means more frequent updates since the 
 os is minimal, which should increase the security effectiveness when compared 
 to big box superstore OSes like Redhat or Ubuntu. Some platforms charge quite 
 a bit for managed updates of this frequency and level of testing.
 
 Coreos allows me to keep apps in a configured container that I trust, tested, 
 and works time and time again.
 
 I see coreos as a compliment.
 
 As a fyi I'm available for questions, debugging, and client work in this area.
 
 Hope this helps some, from real world usage.
 
 Sent from my iPad
 
  On Jan 18, 2015, at 9:16 AM, Victor L vlyamt...@gmail.com 
  mailto:vlyamt...@gmail.com wrote:
 
  I am confused: what's the value of mesos on the top of coreos cluster? 
  Mesos provides distributed resource management, fault tolerance, etc., but 
  doesn't coreos provides the same things already?
  Thanks
 



Re: Autoscaling Mesos Clusters

2014-05-30 Thread Jason Giedymin
You would be surprised how far just scaling when resources offers are 'tight' 
and keeping track of idle CPU for each slave to shut then down can take you.

-Jason

 On May 30, 2014, at 5:57 PM, Diptanu Choudhury dipta...@gmail.com wrote:
 
 Hi,
 
 I am currently working on designing an auto-scaling solution for Mesos slaves 
 in AWS and would love to get some feedback around that. There are a couple of 
 ways for doing it, and I was thinking to start with simple cases first -
 
 a. Define the lowest resource offer a framework can afford to get and then we 
 start using the information published by Mesos master in states.json to 
 determine if the cluster has enough resources. If we see that the available 
 resources won't satisfy the lower bounds set, we bring up new EC2 instances 
 with enough resources that Mesos could use to make offers.
 
 b. Latency for getting an offer for a given job. Say that the framework has a 
 job which needs x cpu, y memory and y ports. If the framework doesn't get an 
 offer until t amount of time, the ASG with slaves of EC2 instance type which 
 can offer that amount of resource is autoscaled. 
 
 c. Maintain historical information about the resources used, jobs submitted 
 and running in Mesos and use that information for doing Predictive 
 autoscaling.
 
 I would like to understand if potentially there are better ways of achieving 
 elasticity in a Mesos cluster and where the complexity lies, information that 
 Mesos could provide us to make it more efficient.
 
 -- 
 Thanks,
 Diptanu Choudhury
 Web - www.linkedin.com/in/diptanu
 Twitter - @diptanu