Re: Mesos-DNS host based HTTP-redirection from slave to container
Hey Itamar, Using DNS to redirect to a port will only be possible if you're using SRV records (I'm not sure what mesos-dns uses) but this doesn't really matter as it won't be looked up by the browser. For this solution I have a small daemon written in go running on a number of hosts (that aren't slaves), this locates the marathon master, and pulls down my apps - I tag apps with a Host label (something like foo.example.com) and then I create a haproxy config file with backends directed by the host header. There's a few more smarts in it around only pulling apps with a green healthcheck etc. This daemon manages the lifecycle of haproxy on the node - it uses a polling model, not an event driven one from the marathon event stream. Another solution that uses the event-stream is this one https://github.com/QubitProducts/bamboo - it's been a while since I checked it out, but was functional back then. Hope that helps. ryan On 2 August 2015 at 07:10, Itamar Ostricher ita...@yowza3d.com wrote: I use marathon to launch a nginx-docker-container named my-app, and set up Mesos-DNS, such that my-app.marathon.mesos returns the IP of the slave running the container (e.g. 10.20.30.40). Now, my-app is running on some dynamically-allocated port (e.g. 31001), but I would like http://my-app.marathon.mesos/foo to hit my app at http://10.20.30.40:31001/foo Is there a best practice way to achieve this behavior? I was thinking about a proxy running on each slave, listening on port 80, redirecting incoming HTTP requests based on the request host to the correct port on localhost. The correct port can be determined by querying mesos-dns itself. This sounds like a pretty common use-case, so I wondered if anyone can point me at an existing solution for this. Thanks! - Itamar.
Re: Mesos-DNS host based HTTP-redirection from slave to container
Yes it appears that mesos-dns does use SRV records - I should really check it out :) On 2 August 2015 at 10:50, Ryan Thomas r.n.tho...@gmail.com wrote: Hey Itamar, Using DNS to redirect to a port will only be possible if you're using SRV records (I'm not sure what mesos-dns uses) but this doesn't really matter as it won't be looked up by the browser. For this solution I have a small daemon written in go running on a number of hosts (that aren't slaves), this locates the marathon master, and pulls down my apps - I tag apps with a Host label (something like foo.example.com) and then I create a haproxy config file with backends directed by the host header. There's a few more smarts in it around only pulling apps with a green healthcheck etc. This daemon manages the lifecycle of haproxy on the node - it uses a polling model, not an event driven one from the marathon event stream. Another solution that uses the event-stream is this one https://github.com/QubitProducts/bamboo - it's been a while since I checked it out, but was functional back then. Hope that helps. ryan On 2 August 2015 at 07:10, Itamar Ostricher ita...@yowza3d.com wrote: I use marathon to launch a nginx-docker-container named my-app, and set up Mesos-DNS, such that my-app.marathon.mesos returns the IP of the slave running the container (e.g. 10.20.30.40). Now, my-app is running on some dynamically-allocated port (e.g. 31001), but I would like http://my-app.marathon.mesos/foo to hit my app at http://10.20.30.40:31001/foo Is there a best practice way to achieve this behavior? I was thinking about a proxy running on each slave, listening on port 80, redirecting incoming HTTP requests based on the request host to the correct port on localhost. The correct port can be determined by querying mesos-dns itself. This sounds like a pretty common use-case, so I wondered if anyone can point me at an existing solution for this. Thanks! - Itamar.
Re: Mesos-DNS host based HTTP-redirection from slave to container
If you are going to be pulling data down yourself it would be better to do it from marathon, than mesos-dns as you will have additional data about the tasks available. On 2 August 2015 at 11:12, tommy xiao xia...@gmail.com wrote: mesos-dns store the app's IP and ports. so you can query the mesos-dns to setup a route rule to define the url. 2015-08-02 17:51 GMT+08:00 Ryan Thomas r.n.tho...@gmail.com: Yes it appears that mesos-dns does use SRV records - I should really check it out :) On 2 August 2015 at 10:50, Ryan Thomas r.n.tho...@gmail.com wrote: Hey Itamar, Using DNS to redirect to a port will only be possible if you're using SRV records (I'm not sure what mesos-dns uses) but this doesn't really matter as it won't be looked up by the browser. For this solution I have a small daemon written in go running on a number of hosts (that aren't slaves), this locates the marathon master, and pulls down my apps - I tag apps with a Host label (something like foo.example.com) and then I create a haproxy config file with backends directed by the host header. There's a few more smarts in it around only pulling apps with a green healthcheck etc. This daemon manages the lifecycle of haproxy on the node - it uses a polling model, not an event driven one from the marathon event stream. Another solution that uses the event-stream is this one https://github.com/QubitProducts/bamboo - it's been a while since I checked it out, but was functional back then. Hope that helps. ryan On 2 August 2015 at 07:10, Itamar Ostricher ita...@yowza3d.com wrote: I use marathon to launch a nginx-docker-container named my-app, and set up Mesos-DNS, such that my-app.marathon.mesos returns the IP of the slave running the container (e.g. 10.20.30.40). Now, my-app is running on some dynamically-allocated port (e.g. 31001), but I would like http://my-app.marathon.mesos/foo to hit my app at http://10.20.30.40:31001/foo Is there a best practice way to achieve this behavior? I was thinking about a proxy running on each slave, listening on port 80, redirecting incoming HTTP requests based on the request host to the correct port on localhost. The correct port can be determined by querying mesos-dns itself. This sounds like a pretty common use-case, so I wondered if anyone can point me at an existing solution for this. Thanks! - Itamar. -- Deshi Xiao Twitter: xds2000 E-mail: xiaods(AT)gmail.com
Mesos 0.21.0 release page correction
Whilst this is a bit old, the docs here http://mesos.apache.org/blog/mesos-0-21-0-released/ for 0.21.0 link to the wrong ticket for the shared filesystem isolator. Cheers, ryan
Re: CPU resource allocation: ignore?
Hey Don, Have you tried only setting the 'cgroups/mem' isolation flag on the slave and not the cpu one? http://mesosphere.com/docs/reference/mesos-slave/ ryan On 19 February 2015 at 14:13, Donald Laidlaw donlaid...@me.com wrote: I am using Mesos 0.21.1 with Marathon 0.8.0 and running everything in docker containers. Is there a way to have mesos ignore the cpu relative shares? That is, not limit the docker container CPU at all when it runs. I would still want to have the Memory resource limitation, but would rather just let the linux system under the containers schedule all the CPU. This would allow us to just allocate tasks to mesos slaves based on available memory only, and to let those tasks get whatever CPU they could when they needed it. This is desireable where there can be lots of relative high memory tasks that have very low CPU requirements. Especially if we do not know the capabilities of the slave machines with regards to CPU. Some of them may have fast CPU's, some slow, so it is hard to pick a relative number for that slave. Thanks, Don Laidlaw
Re: Unable to follow Sandbox links from Mesos UI.
It is a request from your browser session, not from the master that is going to the slaves - so in order to view the sandbox you need to ensure that the machine your browser is on can resolve and route to the masters _and_ the slaves. The master doesn't proxy the sandbox requests through itself (yet) - they are made directly from your browser instance to the slaves. Make sure you can resolve the slaves from the machine you're browsing the UI on. Cheers, ryan On 22 January 2015 at 15:42, Dan Dong dongda...@gmail.com wrote: Thank you all, the master and slaves can resolve each others' hostname and ssh login without password, firewalls have been switched off on all the machines too. So I'm confused what will block such a pull of info of slaves from UI? Cheers, Dan 2015-01-21 16:35 GMT-06:00 Cody Maloney c...@mesosphere.io: Also see https://issues.apache.org/jira/browse/MESOS-2129 if you want to track progress on changing this. Unfortunately it is on hold for me at the moment to fix. Cody On Wed, Jan 21, 2015 at 2:07 PM, Ryan Thomas r.n.tho...@gmail.com wrote: Hey Dan, The UI will attempt to pull that info directly from the slave so you need to make sure the host is resolvable and routeable from your browser. Cheers, Ryan From my phone On Wednesday, 21 January 2015, Dan Dong dongda...@gmail.com wrote: Hi, All, When I try to access sandbox on mesos UI, I see the following info( The same error appears on every slave sandbox.): Failed to connect to slave '20150115-144719-3205108908-5050-4552-S0' on 'centos-2.local:5051'. Potential reasons: The slave's hostname, 'centos-2.local', is not accessible from your network The slave's port, '5051', is not accessible from your network I checked that: slave centos-2.local can be login from any machine in the cluster without password by ssh centos-2.local ; port 5051 on slave centos-2.local could be connected from master by telnet centos-2.local 5051 The stdout and stderr are there on each slave's /tmp/mesos/..., but seems mesos UI just could not access it. (and Both master and slaves are on the same network IP ranges). Should I open any port on slaves? Any hint what's the problem here? Cheers, Dan
Re: cluster wide init
If this was going to be used to allocate tasks outside of the schedulers resource management, and for every slave, why not just use the OS provided init system instead? On 22 January 2015 at 19:40, Sharma Podila spod...@netflix.com wrote: Schedulers can only use resources on slaves that are unused by and unallocated to other schedulers. Therefore, schedulers cannot achieve this unless you reserve slots on every slave for the scheduler. Seems kind of a forced fit. An init like support would be more fundamental to Mesos cluster itself, if available. On Thu, Jan 22, 2015 at 10:08 AM, Ryan Thomas r.n.tho...@gmail.com wrote: This seems more like the responsibility of the scheduler that is running, like marathon or aurora. I haven't tried it but I would imagine if you had 10 slaves and started a job with 11 tasks with host exclusivity when you spin up an 11th slave marathon would start it there. On Thursday, 22 January 2015, Sharma Podila spod...@netflix.com wrote: Just a thought looking forward... Might be useful to define an init kind of feature in Mesos slaves. Configuration can be defined in Mesos master that lists services that must be run on all slaves. When slaves register, they get the list of services to run all the time. Updates to the configuration can be dynamically reflected on all slaves and therefore this ensures that all slaves run the required services. Sophistication can be put in place to have different set of services for different types of slaves (by resource types/quantity, etc.). Such a feature bodes well with Mesos being the DataCenter OS/Kernel. On Thu, Jan 22, 2015 at 9:43 AM, CCAAT cc...@tampabay.rr.com wrote: On 01/21/2015 11:10 PM, Shuai Lin wrote: OK, I'll take a look at the debian package. thanks, James You can always write the init wrapper scripts for marathon. There is an official debian package, which you can find in mesos's apt repo. On Thu, Jan 22, 2015 at 4:20 AM, CCAAT cc...@tampabay.rr.com mailto:cc...@tampabay.rr.com wrote: Hello all, I was reading about Marathon: Marathon scheduler processes were started outside of Mesos using init, upstart, or a similar tool [1] This means So my related questions are Does Marathon work with mesos + Openrc as the init system? Are there any other frameworks that work with Mesos + Openrc? James [1] http://mesosphere.github.io/__marathon/ http://mesosphere.github.io/marathon/
Re: Unable to follow Sandbox links from Mesos UI.
Hey Dan, The UI will attempt to pull that info directly from the slave so you need to make sure the host is resolvable and routeable from your browser. Cheers, Ryan From my phone On Wednesday, 21 January 2015, Dan Dong dongda...@gmail.com wrote: Hi, All, When I try to access sandbox on mesos UI, I see the following info( The same error appears on every slave sandbox.): Failed to connect to slave '20150115-144719-3205108908-5050-4552-S0' on 'centos-2.local:5051'. Potential reasons: The slave's hostname, 'centos-2.local', is not accessible from your network The slave's port, '5051', is not accessible from your network I checked that: slave centos-2.local can be login from any machine in the cluster without password by ssh centos-2.local ; port 5051 on slave centos-2.local could be connected from master by telnet centos-2.local 5051 The stdout and stderr are there on each slave's /tmp/mesos/..., but seems mesos UI just could not access it. (and Both master and slaves are on the same network IP ranges). Should I open any port on slaves? Any hint what's the problem here? Cheers, Dan
Re: Killing Docker containers
I've updated the RB with the feedback from the previous https://reviews.apache.org/r/26734/ On 15 October 2014 08:57, Ryan Thomas r.n.tho...@gmail.com wrote: Here is the RB link https://reviews.apache.org/r/26709/ - fixed at a 30 second timeout at the moment, but I'd imagine that this is something we want to make configurable. ryan On 15 October 2014 08:32, Ankur Chauhan an...@malloc64.com wrote: ++ I was planning on submitting that patch. But if someone has this sorted out already, I'll defer. Sent from my iPhone On Oct 14, 2014, at 2:19 PM, Ryan Thomas r.n.tho...@gmail.com wrote: The docker stop command will attempt to kill the container if it doesn't stop in 10 seconds by default. I think we should be using this with the -t flag to control the time between stop and kill rather than just using kill. I'll try to submit a patch. Cheers, ryan On 15 October 2014 05:37, Scott Rankin sran...@crsinc.com wrote: Hi All, I’m working on prototyping Mesos+Marathon for our services platform, using apps deployed as Docker containers. Our applications register themselves with our service discovery framework on startup and un-register themselves when they shut down (assuming they shut down reasonably gracefully). What I’m finding is that when Mesos shuts down a docker container, it uses “docker kill” as opposed to “docker stop”. I can see the reasoning behind this, but it causes a problem in that the container doesn’t get a chance to clean up after itself. Is this something that might be addressed? Perhaps by trying docker stop and then running kill if it doesn’t shut down after 30 seconds or something? Thanks, Scott This email message contains information that Corporate Reimbursement Services, Inc. considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Corporate Reimbursement Services, Inc., nor should it be used for any purpose other than in the course of any potential or actual business relationship with Corporate Reimbursement Services, Inc. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message. Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein.
Re: Killing Docker containers
Latest review is here https://reviews.apache.org/r/26736/ had to update due to style failures. Cheers, Ryan On Thursday, 16 October 2014, Scott Rankin sran...@crsinc.com wrote: Thanks, Ryan. That solution sounds perfect. From: Ryan Thomas r.n.tho...@gmail.com javascript:_e(%7B%7D,'cvml','r.n.tho...@gmail.com'); Reply-To: user@mesos.apache.org javascript:_e(%7B%7D,'cvml','user@mesos.apache.org'); user@mesos.apache.org javascript:_e(%7B%7D,'cvml','user@mesos.apache.org'); Date: Tuesday, October 14, 2014 at 5:19 PM To: user@mesos.apache.org javascript:_e(%7B%7D,'cvml','user@mesos.apache.org'); user@mesos.apache.org javascript:_e(%7B%7D,'cvml','user@mesos.apache.org'); Subject: Re: Killing Docker containers The docker stop command will attempt to kill the container if it doesn't stop in 10 seconds by default. I think we should be using this with the -t flag to control the time between stop and kill rather than just using kill. I'll try to submit a patch. Cheers, ryan On 15 October 2014 05:37, Scott Rankin sran...@crsinc.com javascript:_e(%7B%7D,'cvml','sran...@crsinc.com'); wrote: Hi All, I’m working on prototyping Mesos+Marathon for our services platform, using apps deployed as Docker containers. Our applications register themselves with our service discovery framework on startup and un-register themselves when they shut down (assuming they shut down reasonably gracefully). What I’m finding is that when Mesos shuts down a docker container, it uses “docker kill” as opposed to “docker stop”. I can see the reasoning behind this, but it causes a problem in that the container doesn’t get a chance to clean up after itself. Is this something that might be addressed? Perhaps by trying docker stop and then running kill if it doesn’t shut down after 30 seconds or something? Thanks, Scott This email message contains information that Corporate Reimbursement Services, Inc. considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Corporate Reimbursement Services, Inc., nor should it be used for any purpose other than in the course of any potential or actual business relationship with Corporate Reimbursement Services, Inc. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message. Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein. This email message contains information that Corporate Reimbursement Services, Inc. considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Corporate Reimbursement Services, Inc., nor should it be used for any purpose other than in the course of any potential or actual business relationship with Corporate Reimbursement Services, Inc. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message. Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein.
Re: Killing Docker containers
The docker stop command will attempt to kill the container if it doesn't stop in 10 seconds by default. I think we should be using this with the -t flag to control the time between stop and kill rather than just using kill. I'll try to submit a patch. Cheers, ryan On 15 October 2014 05:37, Scott Rankin sran...@crsinc.com wrote: Hi All, I’m working on prototyping Mesos+Marathon for our services platform, using apps deployed as Docker containers. Our applications register themselves with our service discovery framework on startup and un-register themselves when they shut down (assuming they shut down reasonably gracefully). What I’m finding is that when Mesos shuts down a docker container, it uses “docker kill” as opposed to “docker stop”. I can see the reasoning behind this, but it causes a problem in that the container doesn’t get a chance to clean up after itself. Is this something that might be addressed? Perhaps by trying docker stop and then running kill if it doesn’t shut down after 30 seconds or something? Thanks, Scott This email message contains information that Corporate Reimbursement Services, Inc. considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Corporate Reimbursement Services, Inc., nor should it be used for any purpose other than in the course of any potential or actual business relationship with Corporate Reimbursement Services, Inc. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message. Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein.
Docker containerizer port conflict
I've been trying out the docker-integration with mesos marathon since the bridged networking has been added and I've run into a couple of issues - the most disturbing seems to be allocating of already in use ports (I suspect this may be a marathon issue) and the failure to recover the tasks once this occurs. What I am running is a very simple setup, driven locally from vagrant. I attempt to run the python3 container specified here under Bridged Networking (https://mesosphere.github.io/marathon/docs/native-docker.html). What I see is that, whilst the container is being pulled for the first time every task exists as KILLED. Once the image has been pulled, the container starts but mesos does not realise this - causing it to fail to start additional containers with port allocation conflicts. Killing the unrecognised container in docker will unblock mesos to start up the containers. Now, once this is started, if I attempt to scale the number of instances up in marathon, I see in the UI that it attempts to start another container (a third in my case, two slaves) with the same port allocations that are already in use on the slave. This is the error in the slave logs: E1005 10:41:01.812988 2883 slave.cpp:2485] Container '05cf52f1-b915-45e5-9071-6b46fda3b71c' for executor 'bridged-webapp.18747ba3-4c7c-11e4-9567-080027100ea3' of framework '20141005-083953-159390892-5050-9177-' failed to start: Failed to 'docker run -d -c 512 -m 67108864 -e PORT=31000 -e PORT0=31000 -e PORTS=31000,31001 -e PORT1=31001 -e MESOS_SANDBOX=/mnt/mesos/sandbox -v /tmp/mesos/slaves/20141005-101854-159390892-5050-1326-0/frameworks/20141005-083953-159390892-5050-9177-/executors/bridged-webapp.18747ba3-4c7c-11e4-9567-080027100ea3/runs/05cf52f1-b915-45e5-9071-6b46fda3b71c:/mnt/mesos/sandbox --net bridge -p 31000:8080/tcp -p 31001:161/udp --entrypoint /bin/sh --name mesos-05cf52f1-b915-45e5-9071-6b46fda3b71c python:3 -c python3 -m http.server 8080': exit status = exited with status 1 stderr = WARNING: Your kernel does not support swap limit capabilities. Limitation discarded. 2014/10/05 10:41:01 Error response from daemon: Cannot start container b2516e3356ca1cf3163f6926249b4e936ec9afe4549ee37f4a9d5df62dbbaf1b: Bind for 0.0.0.0:31000 failed: port is already allocated There is nothing in the stderr or stdout of the task. I have setup the slaves according to the docs (set the containerizers and the timeout) - any help here would be appreciated. Cheers, Ryan
Re: Frontend loadbalancer configuration for long running tasks
Hi Ankur, I saw this on the mesos subreddit not five minutes ago! http://www.qubitproducts.com/content/Opensourcing-Bamboo Cheers, Ryan On 9 Sep 2014 18:53, Ankur Chauhan an...@malloc64.com wrote: Hi all, (Please let me know if this is not the correct place for such a question). I have been looking at mesos + marathon + haproxy as a way of deploying long running web applications. Mesos coupled with marathon's /tasks api gives me all the information needed to get a haproxy configured and load balancing all the tasks but it seems a little too simplistic. I was wondering if there are other projects or if others could share how they configure/reconfigure their loadbalancers when new tasks come alive. Just to make things a little more concrete consider the following use case: There are two web applications that are running as tasks on mesos: 1. webapp1 (http + https) on app1.domain.com 2. webapp2 (http + https) on app2.domain.com We want to configure a HAProxy server that routes traffic from users (:80 and :443) and loadbalances it correctly onto the correct set of tasks. Obviously there is some haproxy configuration happening here but i am interested in finding out what others have been doing in similar cases before I go around building yet another haproxy reconfigure and reload script. -- Ankur
Re: Mesos webcast
Hey Vinod, Will this be recorded? It starts at 4am in my timezone :) Cheers, Ryan On 10 September 2014 03:22, Vinod Kone vinodk...@gmail.com wrote: Hi folks, I'm doing a webcast on Mesos this thursday (h/t Mesosphere) where I will talk about some of the core features of Mesos (slave recovery, authentication and authorization). At the end, we will have time for QA for any and all questions related to Mesos. More details: https://attendee.gotowebinar.com/register/7957587123935365890 Thanks,
Re: Mesos 0.20.0 with Docker registry availability
Whilst this is somewhat unrelated to the mesos implementation, I think it is generally good practice to have immutable tags on the images, this is something I dislike about docker :) Whist the gc of old images will eventually become a problem, it will really only be the layer delta that is consumed with each new tag. But I think yes, there would need to be some mechanism to clear out the images in the local registry. ryan On 5 Sep 2014 18:03, mccraig mccraig mccraigmccr...@gmail.com wrote: ah, so i will have to use a different tag to update an app one immediate problem i can see is that it makes garbage collecting old docker images from slaves harder : currently i update the image associated with a tag and restart tasks to update the running app, then occasionally a cron job to remove all docker images with no tag if every updated image has a new tag it will be harder to figure out which images to remove... perhaps any with no running container, though that could lead to unnecessary pulls and slower restarts of failed tasks :craig On 5 Sep 2014, at 08:43, Ryan Thomas r.n.tho...@gmail.com wrote: Hey Craig, docker run will attempt a pull of the image if it cannot find a matching image and tag in its local repository. So it should only pull on the first run of a given tag. ryan On 5 Sep 2014 17:41, mccraig mccraig mccraigmccr...@gmail.com wrote: hi tim, if it doesn't pull on every run, when will it pull ? :craig On 5 Sep 2014, at 07:05, Tim Chen t...@mesosphere.io wrote: Hi Maxime, It is a very valid concern and that's why I've added a patch that should go out in 0.20.1 to not do a docker pull on every run anymore. Mesos will still try to docker pull when the image isn't available locally (via docker inspect), but only once. The downside ofcourse is that you're not able to automatically get the latest tagged image, but I think it's worth while price to may to gain the benefits of not depending on registry, able to run local images and more. Tim On Thu, Sep 4, 2014 at 10:50 PM, Maxime Brugidou maxime.brugi...@gmail.com wrote: Hi, The current Docker integration in 0.20 does a docker pull from the registry before running any task. This means that your entire Mesos cluster becomes unusable if the registry goes down. The docs allow you to configure a custom .dockercfg for your tasks to point to a private docker registry. However it is not easy to run an HA docker registry. The docker-registry project recommend using S3 storage buy this is definitely not an option for some people. I know that for regular artifacts, Mesos can use HDFS storage and you can run your HDFS datanodes as Mesos tasks. So even if I attempt to have a docker registry storage in HDFS (which is not supported by docker-registry at the moment), I am stuck on a chicken and egg problem. I want to have as little services outside of Mesos as possible and it is hard to maintain HA services (especially outside of Mesos). Is there anyone running Mesos with Docker in production without S3? I am trying to make all the services outside of Mesos (the infra services that are necessary to run Mesos like DNS, Haproxy, Chef server... etc) either HA or not critical for the cluster to run. The docker registry is a new piece of infra outside of Mesos that is critical... Best, Maxime
Re: Issue with Multinode Cluster
If you're using the mesos-init-wrapper you can write the IP to /etc/mesos-master/ip and that flag will be set. This goes for all the flags, and can be done for the slave as well in /etc/mesos-slave. On 26 August 2014 10:18, Vinod Kone vinodk...@gmail.com wrote: From the logs, it looks like master is binding to its loopback address (127.0.0.1) and publishing that to ZK. So the slave is trying to reach the master on its loopback interface, which is failing. Start the master with --ip flag set to its visible ip (10.1.100.116). Mesosphere probably has a file (/etc/defaults/mesos-master?) to set these flags. On Mon, Aug 25, 2014 at 3:26 PM, Frank Hinek frank.hi...@gmail.com wrote: Logs attached from master, slave, and zookeeper after a reboot of both nodes. On August 25, 2014 at 1:14:07 PM, Vinod Kone (vinodk...@gmail.com) wrote: what do the master and slave logs say? On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek frank.hi...@gmail.com wrote: I was able to get a single node environment setup on Ubuntu 14.04.1 following this guide: http://mesosphere.io/learn/install_ubuntu_debian/ The single slave registered with the master via the local Zookeeper and I could run basic commands by posting to Marathon. I then tried to build a multi node cluster following this guide: http://mesosphere.io/docs/mesosphere/getting-started/cloud-install/ The guide walks you through using the Mesosphere packages to install Mesos, Marathon, and Zookeeper one one node that will be the master and on the slave just Mesos. You then disable automatic start of: mesos-slave on the master, mesos-master on the slave, and zookeeper on the slave. It ends up looking like: NODE 1 (MASTER): - IP Address: 10.1.100.116 - mesos-master - marathon - zookeeper NODE 2 (SLAVE): - IP Address: 10.1.100.117 - mesos-slave The issue I’m running into is that the slave rarely is able to register with the master using the Zookeeper. I can never run any jobs from marathon (just trying a simple sleep 5 command). Even when the slave does register the Mesos UI shows 1 “Deactivated” slave — it never goes active. Here are the values I have for /etc/mesos/zk: MASTER: zk://10.1.100.116:2181/mesos SLAVE: zk://10.1.100.116:2181/mesos Any ideas of what to troubleshoot? Would greatly appreciate pointers. Environment details: - Ubuntu Server 14.04.1 running as VMs on ESXi 5.5U1 - Mesos: 0.20.0 - Marathon 0.6.1 There are no apparent connectivity issues, and I’m not having any problems with other VMs on the ESXi host. All VM to VM communication is on the same VLAN and within the same host. Zookeeper log on master (slave briefly registered so I tried to run a sleep 5 command from marathon and then the slave disconnected): 2014-08-25 11:50:34,976 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.1.100.117:45778 2014-08-25 11:50:34,977 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from old client /10.1.100.117:45778; will be dropped if server is in r-o mode 2014-08-25 11:50:34,977 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to establish new session at /10.1.100.117:45778 2014-08-25 11:50:34,978 - INFO [SyncThread:0:ZooKeeperServer@595] - Established session 0x1480b22f7fc with negotiated timeout 1 for client /10.1.100.117:45778 2014-08-25 11:51:05,724 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1480b22f7f1 type:create cxid:0x53faafa9 zxid:0x49 txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode = NodeExists for /marathon 2014-08-25 11:51:05,724 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1480b22f7f1 type:create cxid:0x53faafaa zxid:0x4a txntype:-1 reqpath:n/a Error Path:/marathon/state Error:KeeperErrorCode = NodeExists for /marathon/state 2014-08-25 11:51:09,145 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1480b22f7f1 type:create cxid:0x53faafb5 zxid:0x4d txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode = NodeExists for /marathon 2014-08-25 11:51:09,146 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1480b22f7f1 type:create cxid:0x53faafb6 zxid:0x4e txntype:-1 reqpath:n/a Error Path:/marathon/state Error:KeeperErrorCode = NodeExists for /marathon/state
Re: Issue with Multinode Cluster
I'm not sure what the best-practice is, but I use the /etc/mesos* method as I find it more explicit. On 26 August 2014 10:38, Frank Hinek frank.hi...@gmail.com wrote: Vinod: bingo! I’ve spent 2 days trying to figure this out. The only interfaces on the VMs were eth0 and lo—interesting that it picked the loopback automatically or that the tutorials didn’t note this. Ryan: Is it considered better practice to modify /etc/default/mesos-master or write the IP to /etc/mesos-master/ip ? On August 25, 2014 at 8:31:42 PM, Ryan Thomas (r.n.tho...@gmail.com) wrote: If you're using the mesos-init-wrapper you can write the IP to /etc/mesos-master/ip and that flag will be set. This goes for all the flags, and can be done for the slave as well in /etc/mesos-slave. On 26 August 2014 10:18, Vinod Kone vinodk...@gmail.com wrote: From the logs, it looks like master is binding to its loopback address (127.0.0.1) and publishing that to ZK. So the slave is trying to reach the master on its loopback interface, which is failing. Start the master with --ip flag set to its visible ip (10.1.100.116). Mesosphere probably has a file (/etc/defaults/mesos-master?) to set these flags. On Mon, Aug 25, 2014 at 3:26 PM, Frank Hinek frank.hi...@gmail.com wrote: Logs attached from master, slave, and zookeeper after a reboot of both nodes. On August 25, 2014 at 1:14:07 PM, Vinod Kone (vinodk...@gmail.com) wrote: what do the master and slave logs say? On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek frank.hi...@gmail.com wrote: I was able to get a single node environment setup on Ubuntu 14.04.1 following this guide: http://mesosphere.io/learn/install_ubuntu_debian/ The single slave registered with the master via the local Zookeeper and I could run basic commands by posting to Marathon. I then tried to build a multi node cluster following this guide: http://mesosphere.io/docs/mesosphere/getting-started/cloud-install/ The guide walks you through using the Mesosphere packages to install Mesos, Marathon, and Zookeeper one one node that will be the master and on the slave just Mesos. You then disable automatic start of: mesos-slave on the master, mesos-master on the slave, and zookeeper on the slave. It ends up looking like: NODE 1 (MASTER): - IP Address: 10.1.100.116 - mesos-master - marathon - zookeeper NODE 2 (SLAVE): - IP Address: 10.1.100.117 - mesos-slave The issue I’m running into is that the slave rarely is able to register with the master using the Zookeeper. I can never run any jobs from marathon (just trying a simple sleep 5 command). Even when the slave does register the Mesos UI shows 1 “Deactivated” slave — it never goes active. Here are the values I have for /etc/mesos/zk: MASTER: zk://10.1.100.116:2181/mesos SLAVE: zk://10.1.100.116:2181/mesos Any ideas of what to troubleshoot? Would greatly appreciate pointers. Environment details: - Ubuntu Server 14.04.1 running as VMs on ESXi 5.5U1 - Mesos: 0.20.0 - Marathon 0.6.1 There are no apparent connectivity issues, and I’m not having any problems with other VMs on the ESXi host. All VM to VM communication is on the same VLAN and within the same host. Zookeeper log on master (slave briefly registered so I tried to run a sleep 5 command from marathon and then the slave disconnected): 2014-08-25 11:50:34,976 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.1.100.117:45778 2014-08-25 11:50:34,977 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from old client /10.1.100.117:45778; will be dropped if server is in r-o mode 2014-08-25 11:50:34,977 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to establish new session at /10.1.100.117:45778 2014-08-25 11:50:34,978 - INFO [SyncThread:0:ZooKeeperServer@595] - Established session 0x1480b22f7fc with negotiated timeout 1 for client /10.1.100.117:45778 2014-08-25 11:51:05,724 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1480b22f7f1 type:create cxid:0x53faafa9 zxid:0x49 txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode = NodeExists for /marathon 2014-08-25 11:51:05,724 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1480b22f7f1 type:create cxid:0x53faafaa zxid:0x4a txntype:-1 reqpath:n/a Error Path:/marathon/state Error:KeeperErrorCode = NodeExists for /marathon/state 2014-08-25 11:51:09,145 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1480b22f7f1 type:create cxid:0x53faafb5 zxid:0x4d txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode = NodeExists
Re: MesosCon attendee introduction thread
Hi all, My name is Ryan Thomas and I am a development team lead at Atlassian in Sydney, Australia. We have been playing with Mesos / Marathon / Aurora / Docker since about November last year and I'm really keen to talk to people about the development and deployment process around micro services. I'm occasionally in the irc channels as and on twitter as @hobos_delight. Cheers, and looking forward to chatting with everyone! ryan On 15 August 2014 09:14, Ray Rodriguez rayrod2...@gmail.com wrote: Hi everyone, My name is Ray Rodriguez and I am a data infrastructure engineer in the data science team at Sailthru in New York City. I first started experimenting with Mesos/Marathon/Chronos about 8 months ago and am currently building out a Spark cluster running on Mesos. I'm also into all things automation/CM/Infrastructure as Code etc.. including chef, consul/etcd, zookeeper, docker, coreos. I'm the author of a couple of Mesos cookbooks and recently contributed a collectd plugin for parsing Mesos stats (https://github.com/rayrod2030/collectd-mesos). Looking forward to talking to everyone about their experiences running Spark on Mesos in production and the rest of the Mesos ecosystem. Twitter: @rayray2030 On Thu, Aug 14, 2014 at 7:05 PM, Dave Lester daveles...@gmail.com wrote: Hi All, I thought it would be nice to kickoff a thread for folks to introduce themselves in advance of #MesosCon http://events.linuxfoundation.org/events/mesoscon, so here goes: My name is Dave Lester, and I am Open Source Advocate at Twitter. Twitter is an organizing sponsor for #MesosCon, and I've worked closely with Chris Aniszczyk, the Linux Foundation, and a great team of volunteers to hopefully make this an awesome community event. I'm interested in meeting more companies using Mesos that we can add to our #PoweredByMesos list http://mesos.apache.org/documentation/latest/powered-by-mesos/, and chatting with folks about Apache Aurora http://aurora.incubator.apache.org. Right now my Thursday and Friday evenings are free, so let's grab a beer and chat more. I'm also on Twitter: @davelester Next!
Re: MesosCon attendee introduction thread
Hey David, I'll be keen to have a chat, we've been using Clojure for a bit of work as well. ryan On 15 August 2014 12:08, David Greenberg dsg123456...@gmail.com wrote: I'm David Greenberg, and I work at Two Sigma. I've been rearchitecting our main compute cluster to run on top of Mesos. I've been developing an internal framework with an interesting, different scheduler model. Another member of our team will also be at the conference. I'm really excited to talk to others about using Mesos from Clojure and developing custom frameworks! On Thu, Aug 14, 2014 at 9:15 PM, Bill Farner wfar...@apache.org wrote: I'm Bill Farner, tech lead of the Aurora team at Twitter for the past 4+ years, and an Aurora committer. I will be giving a talk detailing some of the history of Aurora, and explaining some new features we have on the roadmap. We Aurora developers have been really excited to see the project successfully in use by other companies, and I can't wait to discuss details with folks at the conference! -=Bill On Thu, Aug 14, 2014 at 4:55 PM, Brian Wickman wick...@apache.org wrote: I'm Brian Wickman (@wickman http://twitter.com/wickman) from the cloud infrastructure group at Twitter and an Aurora committer. I'll be around both days, probably spending Friday hacking on pesos https://github.com/wickman/pesos and related projects. I'd also be happy to give ad-hoc Aurora tutorials during the Hackathon, e.g. advanced Aurora configuration and/or hacking the Aurora executor come to mind. Looking forward to meeting everyone! ~brian On Thu, Aug 14, 2014 at 4:05 PM, Dave Lester daveles...@gmail.com wrote: Hi All, I thought it would be nice to kickoff a thread for folks to introduce themselves in advance of #MesosCon http://events.linuxfoundation.org/events/mesoscon, so here goes: My name is Dave Lester, and I am Open Source Advocate at Twitter. Twitter is an organizing sponsor for #MesosCon, and I've worked closely with Chris Aniszczyk, the Linux Foundation, and a great team of volunteers to hopefully make this an awesome community event. I'm interested in meeting more companies using Mesos that we can add to our #PoweredByMesos list http://mesos.apache.org/documentation/latest/powered-by-mesos/, and chatting with folks about Apache Aurora http://aurora.incubator.apache.org. Right now my Thursday and Friday evenings are free, so let's grab a beer and chat more. I'm also on Twitter: @davelester Next!