Re: Mesos on RHEL 5.X
Additional problems exist with containers as well. You could also hand roll a kernel upgrade which I've done for some others and this seems to work. However I always make it clear that what you get is "as-is". Kind of like a used car. -Jason > On Apr 19, 2016, at 2:23 PM, Jeff Schroeder> wrote: > > Much of the functionality in mesos won't work such as anything using the > newer isolation modes using control groups. An example would be: > http://mesos.apache.org/documentation/latest/network-monitoring/ > > I guess it is fraught with dragons. Good luck! > >> On Tuesday, April 19, 2016, Manivannan wrote: >> Hi Jeff, >> >> Thanks for your reply. We were running mesos 0.18.0 on the boxes with RHEL >> 5.4 for a long time. >> We cannot upgrade right away to a newer version of operating system due to >> several reasons. >> >> Hi Jie, >> >> Thanks for your reply. >> Do you know if I can install devtools-2 for RHEL 5.4 using yum ? >> >> Thanks, >> Mani >> >>> On Tue, Apr 19, 2016 at 10:54 AM, Jie Yu wrote: >>> I know someone is still using Mesos in production on RHEL 5.4. You need >>> devtoolset-2 to build Mesos. >>> On Tue, Apr 19, 2016 at 10:50 AM, Jeff Schroeder wrote: The RHEL5 kernel will not support the necessary bits for mesos. RHEL6 also lacks the overwhelming majority of support for namespaces and control groups. Try upgrading to RHEL7 and then giving Mesos a go. It doesn't support older kernels. > On Tuesday, April 19, 2016, Manivannan wrote: > Hi, > > When I searched for Mesos rpm for RHEL 5.4, I could not find one. rpms > are available only for RHEL 6 and 7. > > Is there an Mesos 0.28.0 rpm for RHEL 5.4 ? If not has anyone compiled > Mesos 0.28.0 on RHEL 5.X successfully ? > > > Thanks in advance, > Mani -- Text by Jeff, typos by iPhone > > > -- > Text by Jeff, typos by iPhone
Re: AW: Feature request: move in-flight containers w/o stopping them
Food for thought: One should refrain from monolithic apps. If they're small and stateless you should be doing rolling upgrades. If you find yourself with one container and you can't easily distribute that work load by just scaling and load balancing then you have a monolith. Time to enhance it. Containers should not be treated like VMs. -Jason > On Feb 19, 2016, at 6:05 AM, Mike Michelwrote: > > Question is if you really need this when you are moving in the world of > containers/microservices where it is about building stateless 12factor apps > except databases. Why moving a service when you can just kill it and let the > work be done by 10 other containers doing the same? I remember a talk on > dockercon about containers and live migration. It was like: „And now where > you know how to do it, dont’t do it!“ > > Von: Avinash Sridharan [mailto:avin...@mesosphere.io] > Gesendet: Freitag, 19. Februar 2016 05:48 > An: user@mesos.apache.org > Betreff: Re: Feature request: move in-flight containers w/o stopping them > > One problem with implementing something like vMotion for Mesos is to address > seamless movement of network connectivity as well. This effectively requires > moving the IP address of the container across hosts. If the container shares > host network stack, this won't be possible since this would imply moving the > host IP address from one host to another. When a container has its network > namespace, attached to the host, using a bridge, moving across L2 segments > might be a possibility. To move across L3 segments you will need some form of > overlay (VxLAN maybe ?) . > > On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor wrote: > Is this theoretically feasible with Linux checkpoint and restore, perhaps via > CRIU?http://criu.org/Main_Page > > On Feb 18, 2016, at 4:35 AM, Paul Bell wrote: > > Hello All, > > Has there ever been any consideration of the ability to move in-flight > containers from one Mesos host node to another? > > I see this as analogous to VMware's "vMotion" facility wherein VMs can be > moved from one ESXi host to another. > > I suppose something like this could be useful from a load-balancing > perspective. > > Just curious if it's ever been considered and if so - and rejected - why > rejected? > > Thanks. > > -Paul > > > > > > -- > Avinash Sridharan, Mesosphere > +1 (323) 702 5245
Re: docker based executor
What is the last command you have docker doing? If that command exits then the docker will begin to end the container. -Jason On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container '6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework '20150417-190611-2801799596-5050-1-' mesosslave_1 | I0417 19:07:27.985935 7 docker.cpp:1333] Executor for container '6539127f-9dbb-425b-86a8-845b748f0cd3' has exited mesosslave_1 | I0417 19:07:27.986359 7 docker.cpp:1159] Destroying container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986021 9 slave.cpp:3135] Monitoring executor 'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-' in container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.986464 7 docker.cpp:1248] Running docker stop on container '6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:28.28676110 slave.cpp:3186] Executor 'insights-1-1429297638' of framework 20150417-190611-2801799596-5050-1- has terminated with unknown status mesosslave_1 | I0417 19:07:28.28878410 slave.cpp:2508] Handling status update TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task mesos-slave1.service.consul-31000 of framework 20150417-190611-2801799596-5050-1- from @0.0.0.0:0 mesosslave_1 | W0417 19:07:28.289227 9 docker.cpp:841] Ignoring updating unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3 nimbus logs (framework) look like: 2015-04-17T19:07:28.302+ s.m.MesosNimbus [INFO] Received status update: task_id { value: mesos-slave1.service.consul-31000 } state: TASK_LOST message: Container terminated slave_id { value: 20150417-190611-2801799596-5050-1-S0 } timestamp: 1.429297648286981E9 source: SOURCE_SLAVE reason:
Re: docker based executor
Try: until something; do echo waiting for something to do something sleep 5 done You can put this in a bash file and run that. If you have a dockerfile would be easier to debug. -Jason On Apr 17, 2015, at 4:24 PM, Tyson Norris tnor...@adobe.com wrote: Yes, agreed that the command should not exit - but the container is killed at around 0.5 s after launch regardless of whether the command terminates, which is why I’ve been experimenting using commands with varied exit times. For example, forget about the executor needing to register momentarily. Using the command: echo testing123c sleep 0.1 echo testing456c - I see the expected output in stdout, and the container is destroyed (as expected), because the container exits quickly, and then is destroyed Using the command: echo testing123d sleep 0.6 echo testing456d - I do NOT see the expected output in stdout (I only get testing123d), because the container is destroyed prematurely after ~0.5 seconds Using the “real” storm command, I get no output in stdout, probably because no output is generated within 0.5 seconds of launch - it is a bit of a pig to startup, so I’m currently just trying to execute some other commands for testing purposes. So I’m guessing this is a timeout issue, or else that the container is reaped inappropriately, or something else… looking through this code, I’m trying to figure out the steps take during executor launch: https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715 Thanks Tyson On Apr 17, 2015, at 12:53 PM, Jason Giedymin jason.giedy...@gmail.com wrote: What is the last command you have docker doing? If that command exits then the docker will begin to end the container. -Jason On Apr 17, 2015, at 3:23 PM, Tyson Norris tnor...@adobe.com wrote: Hi - I am looking at revving the mesos-storm framework to be dockerized (and simpler). I’m using mesos 0.22.0-1.0.ubuntu1404 mesos master + mesos slave are deployed in docker containers, in case it matters. I have the storm (nimbus) framework launching fine as a docker container, but launching tasks for a topology is having problems related to using a docker-based executor. For example. TaskInfo task = TaskInfo.newBuilder() .setName(worker + slot.getNodeId() + : + slot.getPort()) .setTaskId(taskId) .setSlaveId(offer.getSlaveId()) .setExecutor(ExecutorInfo.newBuilder() .setExecutorId(ExecutorID.newBuilder().setValue(details.getId())) .setData(ByteString.copyFromUtf8(executorDataStr)) .setContainer(ContainerInfo.newBuilder() .setType(ContainerInfo.Type.DOCKER) .setDocker(ContainerInfo.DockerInfo.newBuilder() .setImage(mesos-storm”))) .setCommand(CommandInfo.newBuilder().setShell(true).setValue(storm supervisor storm.mesos.MesosSupervisor)) //rest is unchanged from existing mesos-storm framework code The executor launches and exits quickly - see the log msg: Executor for container '88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited It seems like mesos loses track of the executor? I understand there is a 1 min timeout on registering the executor, but the exit happens well before 1 minute. I tried a few alternate commands to experiment, and I can see in the stdout for the task that echo testing123 echo testing456” prints to stdout correctly, both testing123 and testing456 however: echo testing123a sleep 10 echo testing456a” prints only testing123a, presumably because the container is lost and destroyed before the sleep time is up. So it’s like the container for the executor is only allowed to run for .5 seconds, then it is detected as exited, and the task is lost. Thanks for any advice. Tyson slave logs look like: mesosslave_1 | I0417 19:07:27.46123011 slave.cpp:1121] Got assigned task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46147911 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.46325011 slave.cpp:4160] Launching executor insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1- in work directory '/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3' mesosslave_1 | I0417 19:07:27.46344411 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000' for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1- mesosslave_1 | I0417 19:07:27.467200 7 docker.cpp:755] Starting container
Re: docker based executor
What do any/all logs say? (syslog) -Jason On Apr 17, 2015, at 7:22 PM, Tyson Norris tnor...@adobe.com wrote: another interesting fact: I can restart the docker container of my executor, and it runs great. In the test example below, notice the stdout appears to be growing as expected after restarting the container. So something is killing my executor container (also indicated by the Exited (137) About a minute ago”), but I’m still not sure what. Thanks Tyson tnorris-osx:insights tnorris$ docker ps -a | grep testexec 5291fe29c9c2testexecutor:latest /bin/sh -c executor About a minute ago Exited (137) About a minute ago mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 tnorris-osx:insights tnorris$ docker start mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 tnorris-osx:insights tnorris$ docker logs mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something waiting for something to do something tnorris-osx:insights tnorris$ docker stop mesos-f573677c-d0ee-4aa0-abba-40b7efc7cfe9 On Apr 17, 2015, at 2:11 PM, Tyson Norris tnor...@adobe.com wrote: You can reproduce with most any dockerfile, I think - it seems like launching a customer executor that is a docker container has some problem. I just made a simple test with docker file: -- #this is oracle java8 atop phusion baseimage FROM opentable/baseimage-java8:latest #mesos lib (not used here, but will be in our “real” executor, e.g. to register the executor etc) RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF RUN echo deb http://repos.mesosphere.io/$(lsb_release -is | tr '[:upper:]' '[:lower:]') $(lsb_release -cs) main | tee /etc/apt/sources.list.d/mesosphere.list RUN cat /etc/apt/sources.list.d/mesosphere.list RUN apt-get update apt-get install -y \ mesos ADD script.sh /usr/bin/executor-script.sh CMD executor-script.sh -- and script.sh: -- #!/bin/bash until false; do echo waiting for something to do something sleep 0.2 done -- And in my stdout I get exactly 2 lines: waiting for something to do something waiting for something to do something Which is how many lines can be output in within 0.5 seconds…something is fishy about the 0.5 seconds, but I’m not sure where. I’m not sure exactly the difference, but launching a docker container as a task WITHOUT a custom executor works fine, and I’m not sure about launching a docker container as a task that is using a non-docker custom executor. The case I’m trying for is using a docker customer executor, and launching non-docker tasks. (in case that helps clarify the situation). Thanks Tyson On Apr 17, 2015, at 1:47 PM, Jason Giedymin jason.giedy...@gmail.com wrote: Try: until something; do echo waiting for something to do something sleep 5 done You can put this in a bash file and run that. If you have a dockerfile would be easier to debug. -Jason On Apr 17, 2015, at 4:24 PM, Tyson Norris tnor...@adobe.com wrote: Yes, agreed that the command should not exit - but the container is killed at around 0.5 s after launch regardless of whether the command terminates, which is why I’ve been experimenting using commands with varied exit times. For example, forget about the executor needing to register momentarily. Using the command: echo testing123c sleep 0.1 echo testing456c - I see the expected output in stdout, and the container is destroyed (as expected), because the container exits quickly, and then is destroyed Using the command: echo testing123d sleep 0.6 echo testing456d - I do NOT see the expected output in stdout (I only get testing123d), because the container is destroyed
Re: mesos and coreos?
Coreos places focus on the OS to deploy services as containers. It’s distributed key store is meant to share config in a cluster and to aid in basic scheduling via fleet, which is like cluster wide systemd. It’s scheduler is basic (but can be made to be more complex if you were to use these base tools). On the other hand, Mesos has a more complex featureful scheduler, works as-an application, and has more first class controls over managing jobs (cgroups, etc…) There is not complete overlap between these two systems. They do not necessarily compete with each other. But they do have features which try to address distributed application design/deployment. - J On Jan 18, 2015, at 1:29 PM, Victor L vlyamt...@gmail.com wrote: Hope this helps some It doesn't as it doesn't even try to answer my question. Let me re- phrase it: what does mesos on the coreos cluster do that coreos itself doesn't do already? On Sun, Jan 18, 2015 at 10:00 AM, Jason Giedymin jason.giedy...@gmail.com mailto:jason.giedy...@gmail.com wrote: The value of coreos that immediately comes to mind since I do much work with these tools: - the small foot print, it is a minimal os, meant to run containers. So it throws everything not needed for that out. - containers are the launch vehicle, thus deps are in container land. I can run and test containers with ease, not having to worry about multiple OSes. - with etcd and fleet, coordinating the launch and modification of both machines and cluster make it a breeze. Allowing you to do dynamic mesos scaling up or down. I add nodes at will, across multiple cloud platforms, ready to launch multitude of containers or just mesos. - security. There is a defined write strategy. You cannot write willy nilly to any location. - all the above further allow auto OS updates, which is supported today on all platforms that deploy coreos. This means more frequent updates since the os is minimal, which should increase the security effectiveness when compared to big box superstore OSes like Redhat or Ubuntu. Some platforms charge quite a bit for managed updates of this frequency and level of testing. Coreos allows me to keep apps in a configured container that I trust, tested, and works time and time again. I see coreos as a compliment. As a fyi I'm available for questions, debugging, and client work in this area. Hope this helps some, from real world usage. Sent from my iPad On Jan 18, 2015, at 9:16 AM, Victor L vlyamt...@gmail.com mailto:vlyamt...@gmail.com wrote: I am confused: what's the value of mesos on the top of coreos cluster? Mesos provides distributed resource management, fault tolerance, etc., but doesn't coreos provides the same things already? Thanks
Re: Autoscaling Mesos Clusters
You would be surprised how far just scaling when resources offers are 'tight' and keeping track of idle CPU for each slave to shut then down can take you. -Jason On May 30, 2014, at 5:57 PM, Diptanu Choudhury dipta...@gmail.com wrote: Hi, I am currently working on designing an auto-scaling solution for Mesos slaves in AWS and would love to get some feedback around that. There are a couple of ways for doing it, and I was thinking to start with simple cases first - a. Define the lowest resource offer a framework can afford to get and then we start using the information published by Mesos master in states.json to determine if the cluster has enough resources. If we see that the available resources won't satisfy the lower bounds set, we bring up new EC2 instances with enough resources that Mesos could use to make offers. b. Latency for getting an offer for a given job. Say that the framework has a job which needs x cpu, y memory and y ports. If the framework doesn't get an offer until t amount of time, the ASG with slaves of EC2 instance type which can offer that amount of resource is autoscaled. c. Maintain historical information about the resources used, jobs submitted and running in Mesos and use that information for doing Predictive autoscaling. I would like to understand if potentially there are better ways of achieving elasticity in a Mesos cluster and where the complexity lies, information that Mesos could provide us to make it more efficient. -- Thanks, Diptanu Choudhury Web - www.linkedin.com/in/diptanu Twitter - @diptanu