Re: Deploying containers to every mesos slave node
On 03/12/2015 02:00 PM, Tim St Clair wrote: You may want to also view - https://issues.apache.org/jira/browse/MESOS-1806 as folks have discussed straight up consul integration on that JIRA. Any plans to resolve this JIRA for upcoming 0.22 release. - Gurvinder *From: *Aaron Carey aca...@ilm.com *To: *user@mesos.apache.org *Sent: *Thursday, March 12, 2015 3:54:52 AM *Subject: *Deploying containers to every mesos slave node Hi All, In setting up our cluster, we require things like consul to be running on all of our nodes. I was just wondering if there was any sort of best practice (or a scheduler perhaps) that people could share for this sort of thing? Currently the approach is to use salt to provision each node and add consul/mesos slave process and so on to it, but it'd be nice to remove the dependency on salt. Thanks, Aaron -- Cheers, Timothy St. Clair Red Hat Inc.
Re: mesos on coreos
Thanks Alex for the information and others too for sharing their experiences. - Gurvinder On 03/11/2015 07:50 PM, Alex Rukletsov wrote: Gurvinder, no, there are no publicly available binaries, neither is documentation at this point. We will publish either or both as soon as it is rock solid. On Wed, Mar 11, 2015 at 2:08 AM, Gurvinder Singh gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no wrote: On 03/10/2015 11:41 PM, Tim Chen wrote: Hi all, As Alex said you can run Mesos in CoreOS without Docker if you put in the dependencies in. Tim, is there any documentation of using Mesos outside container in CoreOS available or binary available which we can wget in cloud-init file to fulfill dependencies. As we would like to test it out Mesos on CoreOS outside docker. - Gurvinder It is a common ask though to run Mesos-slave in a Docker container in general, either on CoreOS or not. It's definitely a bit involved as you need to mount in a directory for persisting work dir and also mounting in /sys/fs for cgroups, also you should use the --pid=host flag since Docker 1.5 so it shares the host pid namespace. Although you get a lot less isolation, there are still motivations to run slave in Docker regardless. One thing that's missing from the mesos docker containerizer is that it won't be able to recover tasks on restart, and I have a series of patches pending review to fix that. Tim On Tue, Mar 10, 2015 at 3:16 PM, Alex Rukletsov a...@mesosphere.io mailto:a...@mesosphere.io mailto:a...@mesosphere.io mailto:a...@mesosphere.io wrote: My 2¢. First of all, it doesn’t look like a great idea to package resource manager into Docker putting one more abstraction layer between a resource itself and resource manager. You can run mesos-slave on CoreOS node without putting it into a Docker container. —Alex
Re: mesos on coreos
On 03/10/2015 11:41 PM, Tim Chen wrote: Hi all, As Alex said you can run Mesos in CoreOS without Docker if you put in the dependencies in. Tim, is there any documentation of using Mesos outside container in CoreOS available or binary available which we can wget in cloud-init file to fulfill dependencies. As we would like to test it out Mesos on CoreOS outside docker. - Gurvinder It is a common ask though to run Mesos-slave in a Docker container in general, either on CoreOS or not. It's definitely a bit involved as you need to mount in a directory for persisting work dir and also mounting in /sys/fs for cgroups, also you should use the --pid=host flag since Docker 1.5 so it shares the host pid namespace. Although you get a lot less isolation, there are still motivations to run slave in Docker regardless. One thing that's missing from the mesos docker containerizer is that it won't be able to recover tasks on restart, and I have a series of patches pending review to fix that. Tim On Tue, Mar 10, 2015 at 3:16 PM, Alex Rukletsov a...@mesosphere.io mailto:a...@mesosphere.io wrote: My 2¢. First of all, it doesn’t look like a great idea to package resource manager into Docker putting one more abstraction layer between a resource itself and resource manager. You can run mesos-slave on CoreOS node without putting it into a Docker container. —Alex
Re: mesos on coreos
Hi Micheal, Yes I tested the tutorial and it works fine for testing. Later on I used fleet to run mesos workers on all coreos machines too. I was wondering how the landscape is looking in the community regarding coreos. As is there any interest from community or mesos team to support coreos in general. If yes the then how you see where Mesos fits in with Fleet, Kubernetes. My current understanding is that Fleet is useful for lightweight scheduling, where as Mesos and kubernetes are kind of serving the similar purpose. Mesos has been here for a while and more feature complete than kubernetes. But Kubernetes has more tight integeration with coreos like use etcd for co-ordination, flannel for networking. I wonder what's the plan are when it comes to Mesos for such. I have seen the JIRA for etcd (https://issues.apache.org/jira/browse/MESOS-1806) I understand that the landscape is changing fast but its good to know about Mesos roadmap in this regard. Also would love to know if anybody using Coreos with Mesos beyond testing. Thanks, Gurvinder On 03/09/2015 11:35 PM, Michael Park wrote: Hi Gurvinder, We got started on this work at Mesosphere and there's a tutorial http://mesosphere.com/docs/tutorials/mesosphere-on-a-single-coreos-instance/ on how to do a single-node setup. We ran the mesos-master and slaves in docker containers which led to this JIRA ticket https://issues.apache.org/jira/browse/MESOS-2115. I haven't been able to follow-up on this article recently, and I'd like to hear about others who have made further progress as well. At the time, we were thinking that using fleet shouldn't be too difficult since it uses the systemd unit files but didn't quite get around to it. Perhaps you'll find the tutorial to be a decent starting point. Thanks, MPark. On 9 March 2015 at 17:52, Gurvinder Singh gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no wrote: Hi, I am wondering if anybody in the community has looked into or are running mesos on top of coreos. I would be interested to hear out your experiences around following areas - Users management on coreos cluster and containers running with Mesos - Are you using fleet to run mesos or run it as service in cloud-config and don't use fleet at all - Networking among hosts flannel or ? - Any other interesting insights you found considering such setup Thanks, Gurvinder
Re: mesos on coreos
Thanks Anton for sharing your experience. Response in line. On 03/10/2015 01:01 PM, Anton Kirillov wrote: Hi Gurvinder, our team have an experience with Mesos on CoreOS with fleet, and we decided to switch to bare metal deployments and here are our main reasons. First of all, it doesn’t look like a great idea to package resource manager into Docker putting one more abstraction layer between a resource itself and resource manager. I agree. That was the main reason I asked about closer integeration of mesos with coreos. If you look here kubernetes (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/cloud-configs/master.yaml) run natively in coreos, not in a container. As it is started by cloud-init process. So something similar for mesos will resolve this issue. Although kubernetes is possible due to simple go binary with no dependency, I looked at the Mesos library dependency and compare with library on coreos. Only 2 are missing (libmesos-version.so, libsasl2.so). So I think it is possible for mesos to follow the same model as kubernetes to run natively. From DevOps point of view it is hard to control such thing as ZooKeeper restarts (and ensemble rolling restarts as well) which is the core service discovery mechanism for Mesos. You have to put some sidekick services to provide peer discovery mechanics and it doesn’t look very robust. That's why I mention about the JIRA where mesos can use etcd. The sidekick could be solved by using the flannel (https://github.com/coreos/flannel) to make each container addressable inside your cluster. Very common use case with Mesos is running Docker on top of it either with Marathon or with Aurora. But Docker service needs to be installed on worker nodes. So you’re coming to Docker-into-Docker situation which cancels all advantages of both transparent resource management and simple deployment configuration. One more point on Mesos inside Docker here, is that you have to attach Mesos data directories from container to a host. Given that you’re already running Mesos container in privileged mode and sharing directories with state with host there is no more reasons to run Mesos inside a container. And consider container restart (not just failure) with registry corruption and following synchronization issues. Another our use case with multi-region cluster deployments showed some issues with etcd heartbeat/leader election timeouts, which need to be increased in order to handle bigger latencies between data centers. If timeouts increase fleet starts to work in unpredictable way, loosing and finding peer nodes again which is not appropriate in production environment. I have not experience with multi region deployment. As such scenario for zookeeper can also be hard, where as consul claims to address this issue. The reason for asking is that with coreos we have a small footprint, upto date OS which can boot the mesos to manage whole cluster. By using docker, we can have multi tenancy support too. Just ideas :P - Gurvinder You can take a look at this configuration for Mesos-CoreOS-HA as well https://github.com/akirillov/mesos-deploy/tree/master/mesos-coreos-ha -- Anton Kirillov Sent with Sparrow http://www.sparrowmailapp.com/?sig On Tuesday, March 10, 2015 at 11:08 AM, Gurvinder Singh wrote: Hi Micheal, Yes I tested the tutorial and it works fine for testing. Later on I used fleet to run mesos workers on all coreos machines too. I was wondering how the landscape is looking in the community regarding coreos. As is there any interest from community or mesos team to support coreos in general. If yes the then how you see where Mesos fits in with Fleet, Kubernetes. My current understanding is that Fleet is useful for lightweight scheduling, where as Mesos and kubernetes are kind of serving the similar purpose. Mesos has been here for a while and more feature complete than kubernetes. But Kubernetes has more tight integeration with coreos like use etcd for co-ordination, flannel for networking. I wonder what's the plan are when it comes to Mesos for such. I have seen the JIRA for etcd (https://issues.apache.org/jira/browse/MESOS-1806) I understand that the landscape is changing fast but its good to know about Mesos roadmap in this regard. Also would love to know if anybody using Coreos with Mesos beyond testing. Thanks, Gurvinder On 03/09/2015 11:35 PM, Michael Park wrote: Hi Gurvinder, We got started on this work at Mesosphere and there's a tutorial http://mesosphere.com/docs/tutorials/mesosphere-on-a-single-coreos-instance/ on how to do a single-node setup. We ran the mesos-master and slaves in docker containers which led to this JIRA ticket https://issues.apache.org/jira/browse/MESOS-2115. I haven't been able to follow-up on this article recently, and I'd like to hear about others who have made further progress as well. At the time, we were
Re: mesos on coreos
On 03/10/2015 03:57 PM, Anton Kirillov wrote: Gurvinder, your points are really interesting to consider, but as for me it is still looks pretty like a bit narrow solution, because not all widespread systems are having frameworks to run on Mesos. But it really depends on your goals. We have pretty specific use cases, one of them is using Spark on Mesos to achieve HA alongside with Cassandra as datastore. So we install Mesos Slave with Cassandra node on the same machine to achieve greater data locality. And Docker is pretty poor choice for running Cassandra in it but there is no other way to run Cassandra on CoreOS (afaik) We also plan to run different frameworks on our cluster and Spark, Cassandra being one of them. I would like to know what are the issues you faced while running cassandra in docker. As docker with volume attached has almost bare metal performance. Another idea is that when you go to “big iron” OS footprint doesn’t matter a lot while you have multi-core and huge RAM hardware. It looks like premature optimization. Optimization is one thing, but with CoreOS you get more determinstic updates of OS with rollback option which can be quite useful when running large infrastructure. The current discussion is to get to know about community feeling about these ideas, It is not to say that it's the best solution :) - Gurvinder My points just come from recent experience and more problem-oriented. But it would be really nice to see Mesos as native CoreOS service to experiment with. -- Anton Kirillov Sent with Sparrow http://www.sparrowmailapp.com/?sig On Tuesday, March 10, 2015 at 3:12 PM, Gurvinder Singh wrote: Thanks Anton for sharing your experience. Response in line. On 03/10/2015 01:01 PM, Anton Kirillov wrote: Hi Gurvinder, our team have an experience with Mesos on CoreOS with fleet, and we decided to switch to bare metal deployments and here are our main reasons. First of all, it doesn’t look like a great idea to package resource manager into Docker putting one more abstraction layer between a resource itself and resource manager. I agree. That was the main reason I asked about closer integeration of mesos with coreos. If you look here kubernetes (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/cloud-configs/master.yaml) run natively in coreos, not in a container. As it is started by cloud-init process. So something similar for mesos will resolve this issue. Although kubernetes is possible due to simple go binary with no dependency, I looked at the Mesos library dependency and compare with library on coreos. Only 2 are missing (libmesos-version.so, libsasl2.so). So I think it is possible for mesos to follow the same model as kubernetes to run natively. From DevOps point of view it is hard to control such thing as ZooKeeper restarts (and ensemble rolling restarts as well) which is the core service discovery mechanism for Mesos. You have to put some sidekick services to provide peer discovery mechanics and it doesn’t look very robust. That's why I mention about the JIRA where mesos can use etcd. The sidekick could be solved by using the flannel (https://github.com/coreos/flannel) to make each container addressable inside your cluster. Very common use case with Mesos is running Docker on top of it either with Marathon or with Aurora. But Docker service needs to be installed on worker nodes. So you’re coming to Docker-into-Docker situation which cancels all advantages of both transparent resource management and simple deployment configuration. One more point on Mesos inside Docker here, is that you have to attach Mesos data directories from container to a host. Given that you’re already running Mesos container in privileged mode and sharing directories with state with host there is no more reasons to run Mesos inside a container. And consider container restart (not just failure) with registry corruption and following synchronization issues. Another our use case with multi-region cluster deployments showed some issues with etcd heartbeat/leader election timeouts, which need to be increased in order to handle bigger latencies between data centers. If timeouts increase fleet starts to work in unpredictable way, loosing and finding peer nodes again which is not appropriate in production environment. I have not experience with multi region deployment. As such scenario for zookeeper can also be hard, where as consul claims to address this issue. The reason for asking is that with coreos we have a small footprint, upto date OS which can boot the mesos to manage whole cluster. By using docker, we can have multi tenancy support too. Just ideas :P - Gurvinder You can take a look at this configuration for Mesos-CoreOS-HA as well https://github.com/akirillov/mesos-deploy/tree/master/mesos-coreos-ha -- Anton Kirillov Sent with Sparrow
mesos on coreos
Hi, I am wondering if anybody in the community has looked into or are running mesos on top of coreos. I would be interested to hear out your experiences around following areas - Users management on coreos cluster and containers running with Mesos - Are you using fleet to run mesos or run it as service in cloud-config and don't use fleet at all - Networking among hosts flannel or ? - Any other interesting insights you found considering such setup Thanks, Gurvinder
Re: logstash config
This is the config we use for mesos logs MESOSTIMESTAMP %{MONTHNUM}%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} MESOSLOG %{DATA:loglevel}%{MESOSTIMESTAMP:timestamp}\s+%{POSINT:pid}\s+%{NOTSPACE:class}: in logstash config if [type] == mesos { grok { patterns_dir = [path to your patterns dir] match= [message, %{MESOSLOG}] } } - Gurvinder On 03/06/2015 03:40 AM, David J. Palaitis wrote: Anyone out there have a logstash config for Mesos log format they'd like to share? I'm finding the date format stubbornly difficult to map to timestamp.
Re: spark and mesos issue
It might not be related only to memory issue. Memory issue is also there as you mentioned. I have seen that one too. The fine mode issue is mainly spark considering that it got two different block manager for same ID, whereas if I search for the ID in the mesos slave, it exist only on the one slave not on multiple of them. Theis might be due to the size of ID, as spark out the error as 14/09/16 08:04:29 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140822-112818-711206558-5050-25951-0 where as in the mesos slave I see logs as I0915 20:55:18.293903 31434 containerizer.cpp:392] Starting container '3aab2237-d32f-470d-a206-7bada454ad3f' for executor '20140822-112818-711206558-5050-25951-0' of framework '20140822-112818-711206558-5050-25951-0053' I0915 20:53:28.039218 31437 containerizer.cpp:392] Starting container 'fe4b344f-16c9-484a-9c2f-92bd92b43f6d' for executor '20140822-112818-711206558-5050-25951-0' of framework '20140822-112818-711206558-5050-25951-0050' you the last 3 digits of ID are missing in spark where as they are different in mesos slaves. - Gurvinder On 09/15/2014 11:13 PM, Brenden Matthews wrote: I started hitting a similar problem, and it seems to be related to memory overhead and tasks getting OOM killed. I filed a ticket here: https://issues.apache.org/jira/browse/SPARK-3535 On Wed, Jul 16, 2014 at 5:27 AM, Ray Rodriguez rayrod2...@gmail.com mailto:rayrod2...@gmail.com wrote: I'll set some time aside today to gather and post some logs and details about this issue from our end. On Wed, Jul 16, 2014 at 2:05 AM, Vinod Kone vinodk...@gmail.com mailto:vinodk...@gmail.com wrote: On Tue, Jul 15, 2014 at 11:02 PM, Vinod Kone vi...@twitter.com mailto:vi...@twitter.com wrote: On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no wrote: ERROR storage.BlockManagerMasterActor: Got two different block manager registrations on 201407031041-1227224054-5050-24004-0 Googling about it seems that mesos is starting slaves at the same time and giving them the same id. So may bug in mesos ? Has this issue been resolved? We need more information to triage this. Maybe some logs that show the lifecycle of the duplicate instances? @vinodkone
Re: multi tenant setup
Hi Niklas, I am using Apache spark with mesos 0.19.1. I have limited resources and when I submit a job which takes all of the resources. This is fine as when no one is using them, but when one of my colleagues submit his job, I would like mesos allows some part of resources assigned to his job when part of my jobs are finished, But it seems currently it waits until my whole job is finished before starting the other job. Is it due to mesos or you think Spark is the one who is blocking the job. - Gurvinder On 07/31/2014 06:12 PM, Niklas Nielsen wrote: Hi Gurvinder, The frameworks competing for resources will get their (weighted) fair share of the cluster. The allocator in the master uses the Dominant Resource Fairness algorithm to do this (http://static.usenix.org/event/nsdi11/tech/full_papers/Ghodsi.pdf). Regarding FIFO, are you referring to 'local' scheduler policies? How tasks are dispatched is up to the individual framework. Cheers, Niklas On 31 July 2014 07:28, Gurvinder Singh gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no wrote: Hi, I am wondering how mesos handle the task scheduling when the resource are limited and multiple users want to access them at the same time. Is there any kind of fair scheduling as I see currently mainly FIFO. If there is how can I specify that. Thanks, Gurvinder
multi tenant setup
Hi, I am wondering how mesos handle the task scheduling when the resource are limited and multiple users want to access them at the same time. Is there any kind of fair scheduling as I see currently mainly FIFO. If there is how can I specify that. Thanks, Gurvinder
spark and mesos issue
We are getting this issue when we are running jobs with close to 1000 workers. Spark is from the github version and mesos is 0.19.0 ERROR storage.BlockManagerMasterActor: Got two different block manager registrations on 201407031041-1227224054-5050-24004-0 Googling about it seems that mesos is starting slaves at the same time and giving them the same id. So may bug in mesos ? Thanks, Gurvinder On 07/04/2014 01:03 AM, Vinod Kone wrote: correct url: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1 On Thu, Jul 3, 2014 at 1:40 PM, Vinod Kone vinodk...@gmail.com mailto:vinodk...@gmail.com wrote: Hi, We are planning to release 0.19.1 (likely next week) which will be a bug fix release. Specifically, these are the fixes that we are planning to cherry pick. https://issues.apache.org/jira/issues/?filter=12326191jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1 If there are other critical fixes that need to be backported to 0.19.1 please reply here as soon as possible. Thanks,