Re: Deploying containers to every mesos slave node

2015-03-12 Thread Gurvinder Singh
On 03/12/2015 02:00 PM, Tim St Clair wrote:
 You may want to also view
 - https://issues.apache.org/jira/browse/MESOS-1806 
 
 as folks have discussed straight up consul integration on that JIRA. 
Any plans to resolve this JIRA for upcoming 0.22 release.

- Gurvinder
 
 
 
 *From: *Aaron Carey aca...@ilm.com
 *To: *user@mesos.apache.org
 *Sent: *Thursday, March 12, 2015 3:54:52 AM
 *Subject: *Deploying containers to every mesos slave node
 
 Hi All,
 
 In setting up our cluster, we require things like consul to be
 running on all of our nodes. I was just wondering if there was any
 sort of best practice (or a scheduler perhaps) that people could
 share for this sort of thing?
 
 Currently the approach is to use salt to provision each node and add
 consul/mesos slave process and so on to it, but it'd be nice to
 remove the dependency on salt.
 
 Thanks,
 Aaron
 
 
 
 
 -- 
 Cheers,
 Timothy St. Clair
 Red Hat Inc.



Re: mesos on coreos

2015-03-11 Thread Gurvinder Singh
Thanks Alex for the information and others too for sharing their
experiences.

- Gurvinder
On 03/11/2015 07:50 PM, Alex Rukletsov wrote:
 Gurvinder,
 
 no, there are no publicly available binaries, neither is documentation
 at this point. We will publish either or both as soon as it is rock solid.
 
 On Wed, Mar 11, 2015 at 2:08 AM, Gurvinder Singh
 gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no wrote:
 
 On 03/10/2015 11:41 PM, Tim Chen wrote:
  Hi all,
 
  As Alex said you can run Mesos in CoreOS without Docker if you put in
  the dependencies in.
 
 Tim, is there any documentation of using Mesos outside container in
 CoreOS available or binary available which we can wget in cloud-init
 file to fulfill dependencies. As we would like to test it out Mesos on
 CoreOS outside docker.
 
 - Gurvinder
  It is a common ask though to run Mesos-slave in a Docker container in
  general, either on CoreOS or not. It's definitely a bit involved as you
  need to mount in a directory for persisting work dir and also mounting
  in /sys/fs for cgroups, also you should use the --pid=host flag since
  Docker 1.5 so it shares the host pid namespace.
 
  Although you get a lot less isolation, there are still motivations to
  run slave in Docker regardless.
 
  One thing that's missing from the mesos docker containerizer is that it
  won't be able to recover tasks on restart, and I have a series of
  patches pending review to fix that.
 
  Tim
 
  On Tue, Mar 10, 2015 at 3:16 PM, Alex Rukletsov a...@mesosphere.io 
 mailto:a...@mesosphere.io
  mailto:a...@mesosphere.io mailto:a...@mesosphere.io wrote:
 
  My 2¢.
 
 
  First of all, it doesn’t look like a great idea to package
  resource manager into Docker putting one more abstraction
 layer
  between a resource itself and resource manager.
 
 
  You can run mesos-slave on CoreOS node without putting it into a
  Docker container.
 
  —Alex
 
 
 
 



Re: mesos on coreos

2015-03-11 Thread Gurvinder Singh
On 03/10/2015 11:41 PM, Tim Chen wrote:
 Hi all,
 
 As Alex said you can run Mesos in CoreOS without Docker if you put in
 the dependencies in.
 
Tim, is there any documentation of using Mesos outside container in
CoreOS available or binary available which we can wget in cloud-init
file to fulfill dependencies. As we would like to test it out Mesos on
CoreOS outside docker.

- Gurvinder
 It is a common ask though to run Mesos-slave in a Docker container in
 general, either on CoreOS or not. It's definitely a bit involved as you
 need to mount in a directory for persisting work dir and also mounting
 in /sys/fs for cgroups, also you should use the --pid=host flag since
 Docker 1.5 so it shares the host pid namespace.
 
 Although you get a lot less isolation, there are still motivations to
 run slave in Docker regardless. 
 
 One thing that's missing from the mesos docker containerizer is that it
 won't be able to recover tasks on restart, and I have a series of
 patches pending review to fix that.
 
 Tim
 
 On Tue, Mar 10, 2015 at 3:16 PM, Alex Rukletsov a...@mesosphere.io
 mailto:a...@mesosphere.io wrote:
 
 My 2¢.
  
 
 First of all, it doesn’t look like a great idea to package
 resource manager into Docker putting one more abstraction layer
 between a resource itself and resource manager. 
 
 
 You can run mesos-slave on CoreOS node without putting it into a
 Docker container.
  
 —Alex
 
 



Re: mesos on coreos

2015-03-10 Thread Gurvinder Singh
Hi Micheal,

Yes I tested the tutorial and it works fine for testing. Later on I used
fleet to run mesos workers on all coreos machines too. I was wondering
how the landscape is looking in the community regarding coreos. As is
there any interest from community or mesos team to support coreos in
general. If yes the then how you see where Mesos fits in with Fleet,
Kubernetes.

My current understanding is that Fleet is useful for lightweight
scheduling, where as Mesos and kubernetes are kind of serving the
similar purpose. Mesos has been here for a while and more feature
complete than kubernetes. But Kubernetes has more tight integeration
with coreos like use etcd for co-ordination, flannel for networking. I
wonder what's the plan are when it comes to Mesos for such. I have seen
the JIRA for etcd (https://issues.apache.org/jira/browse/MESOS-1806)

I understand that the landscape is changing fast but its good to know
about Mesos roadmap in this regard. Also would love to know if anybody
using Coreos with Mesos beyond testing.

Thanks,
Gurvinder
On 03/09/2015 11:35 PM, Michael Park wrote:
 Hi Gurvinder,

 We got started on this work at Mesosphere and there's a tutorial
 http://mesosphere.com/docs/tutorials/mesosphere-on-a-single-coreos-instance/
 on
 how to do a single-node setup. We ran the mesos-master and slaves in
 docker containers which led to this JIRA ticket
 https://issues.apache.org/jira/browse/MESOS-2115. I haven't been able
 to follow-up on this article recently, and I'd like to hear about others
 who have made further progress as well.

 At the time, we were thinking that using fleet shouldn't be too
 difficult since it uses the systemd unit files but didn't quite get
 around to it.

 Perhaps you'll find the tutorial to be a decent starting point.

 Thanks,

 MPark.

 On 9 March 2015 at 17:52, Gurvinder Singh gurvinder.si...@uninett.no
 mailto:gurvinder.si...@uninett.no wrote:

 Hi,

 I am wondering if anybody in the community has looked into or are
 running mesos on top of coreos. I would be interested to hear out your
 experiences around following areas

 - Users management on coreos cluster and containers running with Mesos
 - Are you using fleet to run mesos or run it as service in
 cloud-config
 and don't use fleet at all
 - Networking among hosts flannel or ?
 - Any other interesting insights you found considering such setup

 Thanks,
 Gurvinder





Re: mesos on coreos

2015-03-10 Thread Gurvinder Singh
Thanks Anton for sharing your experience. Response in line.
On 03/10/2015 01:01 PM, Anton Kirillov wrote:
 Hi Gurvinder,
 
 our team have an experience with Mesos on CoreOS with fleet, and we
 decided to switch to bare metal deployments and here are our main reasons.
 
 First of all, it doesn’t look like a great idea to package resource
 manager into Docker putting one more abstraction layer between a
 resource itself and resource manager. 
I agree. That was the main reason I asked about closer integeration of
mesos with coreos. If you look here kubernetes
(https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/cloud-configs/master.yaml)
run natively in coreos, not in a container. As it is started by
cloud-init process. So something similar for mesos will resolve this
issue. Although kubernetes is possible due to simple go binary with no
dependency, I looked at the Mesos library dependency and compare with
library on coreos. Only 2 are missing (libmesos-version.so,
libsasl2.so). So I think it is possible for mesos to follow the same
model as kubernetes to run natively.

 
 From DevOps point of view it is hard to control such thing as ZooKeeper
 restarts (and ensemble rolling restarts as well) which is the core
 service discovery mechanism for Mesos. You have to put some sidekick
 services to provide peer discovery mechanics and it doesn’t look very
 robust. 
That's why I mention about the JIRA where mesos can use etcd. The
sidekick could be solved by using the flannel
(https://github.com/coreos/flannel) to make each container addressable
inside your cluster.
 
 Very common use case with Mesos is running Docker on top of it either
 with Marathon or with Aurora. But Docker service needs to be installed
 on worker nodes. So you’re coming to Docker-into-Docker situation which
 cancels all advantages of both transparent resource management and
 simple deployment configuration.
 
 One more point on Mesos inside Docker here, is that you have to attach
 Mesos data directories from container to a host. Given that you’re
 already running Mesos container in privileged mode and sharing
 directories with state with host there is no more reasons to run Mesos
 inside a container. And consider container restart (not just failure)
 with registry corruption and following synchronization issues. 
 
 Another our use case with multi-region cluster deployments showed some
 issues with etcd heartbeat/leader election timeouts, which need to be
 increased in order to handle bigger latencies between data centers. If
 timeouts increase fleet starts to work in unpredictable way, loosing and
 finding peer nodes again which is not appropriate in production environment.
I have not experience with multi region deployment. As such scenario for
zookeeper can also be hard, where as consul claims to address this issue.

The reason for asking is that with coreos we have a small footprint,
upto date OS which can boot the mesos to manage whole cluster. By using
docker, we can have multi tenancy support too. Just ideas :P

- Gurvinder

 
 You can take a look at this configuration for Mesos-CoreOS-HA as
 well https://github.com/akirillov/mesos-deploy/tree/master/mesos-coreos-ha
 
 -- 
 Anton Kirillov
 Sent with Sparrow http://www.sparrowmailapp.com/?sig
 
 On Tuesday, March 10, 2015 at 11:08 AM, Gurvinder Singh wrote:
 
 Hi Micheal,

 Yes I tested the tutorial and it works fine for testing. Later on I used
 fleet to run mesos workers on all coreos machines too. I was wondering
 how the landscape is looking in the community regarding coreos. As is
 there any interest from community or mesos team to support coreos in
 general. If yes the then how you see where Mesos fits in with Fleet,
 Kubernetes.

 My current understanding is that Fleet is useful for lightweight
 scheduling, where as Mesos and kubernetes are kind of serving the
 similar purpose. Mesos has been here for a while and more feature
 complete than kubernetes. But Kubernetes has more tight integeration
 with coreos like use etcd for co-ordination, flannel for networking. I
 wonder what's the plan are when it comes to Mesos for such. I have seen
 the JIRA for etcd (https://issues.apache.org/jira/browse/MESOS-1806)

 I understand that the landscape is changing fast but its good to know
 about Mesos roadmap in this regard. Also would love to know if anybody
 using Coreos with Mesos beyond testing.

 Thanks,
 Gurvinder
 On 03/09/2015 11:35 PM, Michael Park wrote:
 Hi Gurvinder,

 We got started on this work at Mesosphere and there's a tutorial
 http://mesosphere.com/docs/tutorials/mesosphere-on-a-single-coreos-instance/
 on
 how to do a single-node setup. We ran the mesos-master and slaves in
 docker containers which led to this JIRA ticket
 https://issues.apache.org/jira/browse/MESOS-2115. I haven't been able
 to follow-up on this article recently, and I'd like to hear about others
 who have made further progress as well.

 At the time, we were

Re: mesos on coreos

2015-03-10 Thread Gurvinder Singh
On 03/10/2015 03:57 PM, Anton Kirillov wrote:
 Gurvinder,
 
 your points are really interesting to consider, but as for me it
 is still looks pretty like a bit narrow solution, because not all 
 widespread systems are having frameworks to run on Mesos. But it
 really depends on your goals.
 
 We have pretty specific use cases, one of them is using Spark on
 Mesos to achieve HA alongside with Cassandra as datastore. So we
 install Mesos Slave with Cassandra node on the same machine to
 achieve greater data locality. And Docker is pretty poor choice for
 running Cassandra in it but there is no other way to run Cassandra
 on CoreOS (afaik)
We also plan to run different frameworks on our cluster and Spark,
Cassandra being one of them. I would like to know what are the issues
you faced while running cassandra in docker. As docker with volume
attached has almost bare metal performance.
 
 Another idea is that when you go to “big iron” OS footprint
 doesn’t matter a lot while you have multi-core and huge RAM
 hardware. It looks like premature optimization.
 
Optimization is one thing, but with CoreOS you get more determinstic
updates of OS with rollback option which can be quite useful when
running large infrastructure. The current discussion is to get to know
about community feeling about these ideas, It is not to say that it's
the best solution :)

- Gurvinder
 My points just come from recent experience and more
 problem-oriented. But it would be really nice to see Mesos as
 native CoreOS service to experiment with.
 
 -- Anton Kirillov Sent with Sparrow
 http://www.sparrowmailapp.com/?sig
 
 On Tuesday, March 10, 2015 at 3:12 PM, Gurvinder Singh wrote:
 
 Thanks Anton for sharing your experience. Response in line. On
 03/10/2015 01:01 PM, Anton Kirillov wrote:
 Hi Gurvinder,
 
 our team have an experience with Mesos on CoreOS with fleet,
 and we decided to switch to bare metal deployments and here are
 our main reasons.
 
 First of all, it doesn’t look like a great idea to package
 resource manager into Docker putting one more abstraction layer
 between a resource itself and resource manager.
 I agree. That was the main reason I asked about closer
 integeration of mesos with coreos. If you look here kubernetes 
 (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/coreos/cloud-configs/master.yaml)

 
run natively in coreos, not in a container. As it is started by
 cloud-init process. So something similar for mesos will resolve
 this issue. Although kubernetes is possible due to simple go
 binary with no dependency, I looked at the Mesos library
 dependency and compare with library on coreos. Only 2 are missing
 (libmesos-version.so, libsasl2.so). So I think it is possible
 for mesos to follow the same model as kubernetes to run
 natively.
 
 
 From DevOps point of view it is hard to control such thing as
 ZooKeeper restarts (and ensemble rolling restarts as well)
 which is the core service discovery mechanism for Mesos. You
 have to put some sidekick services to provide peer discovery
 mechanics and it doesn’t look very robust.
 That's why I mention about the JIRA where mesos can use etcd.
 The sidekick could be solved by using the flannel 
 (https://github.com/coreos/flannel) to make each container
 addressable inside your cluster.
 
 Very common use case with Mesos is running Docker on top of it
 either with Marathon or with Aurora. But Docker service needs
 to be installed on worker nodes. So you’re coming to
 Docker-into-Docker situation which cancels all advantages of
 both transparent resource management and simple deployment
 configuration.
 
 One more point on Mesos inside Docker here, is that you have to
 attach Mesos data directories from container to a host. Given
 that you’re already running Mesos container in privileged mode
 and sharing directories with state with host there is no more
 reasons to run Mesos inside a container. And consider container
 restart (not just failure) with registry corruption and
 following synchronization issues.
 
 Another our use case with multi-region cluster deployments
 showed some issues with etcd heartbeat/leader election
 timeouts, which need to be increased in order to handle bigger
 latencies between data centers. If timeouts increase fleet
 starts to work in unpredictable way, loosing and finding peer
 nodes again which is not appropriate in production 
 environment.
 I have not experience with multi region deployment. As such
 scenario for zookeeper can also be hard, where as consul claims
 to address this issue.
 
 The reason for asking is that with coreos we have a small
 footprint, upto date OS which can boot the mesos to manage whole
 cluster. By using docker, we can have multi tenancy support too.
 Just ideas :P
 
 - Gurvinder
 
 
 You can take a look at this configuration for Mesos-CoreOS-HA
 as well 
 https://github.com/akirillov/mesos-deploy/tree/master/mesos-coreos-ha


 
-- 
 Anton Kirillov Sent with Sparrow

mesos on coreos

2015-03-09 Thread Gurvinder Singh
Hi,

I am wondering if anybody in the community has looked into or are
running mesos on top of coreos. I would be interested to hear out your
experiences around following areas

- Users management on coreos cluster and containers running with Mesos
- Are you using fleet to run mesos or run it as service in cloud-config
and don't use fleet at all
- Networking among hosts flannel or ?
- Any other interesting insights you found considering such setup

Thanks,
Gurvinder


Re: logstash config

2015-03-05 Thread Gurvinder Singh
This is the config we use for mesos logs

MESOSTIMESTAMP %{MONTHNUM}%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}

MESOSLOG 
%{DATA:loglevel}%{MESOSTIMESTAMP:timestamp}\s+%{POSINT:pid}\s+%{NOTSPACE:class}:

in logstash config

if [type] == mesos {
grok {
  patterns_dir = [path to your patterns dir]
  match= [message, %{MESOSLOG}]
}
}

- Gurvinder
On 03/06/2015 03:40 AM, David J. Palaitis wrote:
 Anyone out there have a logstash config for Mesos log format they'd like
 to share? I'm finding the date format stubbornly difficult to map to
 timestamp.




Re: spark and mesos issue

2014-09-16 Thread Gurvinder Singh
It might not be related only to memory issue. Memory issue is also
there as you mentioned. I have seen that one too. The fine mode issue
is mainly spark considering that it got two different block manager
for same ID, whereas if I search for the ID in the mesos slave, it
exist only on the one slave not on multiple of them. Theis might be
due to the size of ID, as spark out the error as

14/09/16 08:04:29 ERROR BlockManagerMasterActor: Got two different
block manager registrations on 20140822-112818-711206558-5050-25951-0

where as in the mesos slave I see logs as

I0915 20:55:18.293903 31434 containerizer.cpp:392] Starting container
'3aab2237-d32f-470d-a206-7bada454ad3f' for executor
'20140822-112818-711206558-5050-25951-0' of framework
'20140822-112818-711206558-5050-25951-0053'

I0915 20:53:28.039218 31437 containerizer.cpp:392] Starting container
'fe4b344f-16c9-484a-9c2f-92bd92b43f6d' for executor
'20140822-112818-711206558-5050-25951-0' of framework
'20140822-112818-711206558-5050-25951-0050'


you the last 3 digits of ID are missing in spark where as they are
different in mesos slaves.

- Gurvinder
On 09/15/2014 11:13 PM, Brenden Matthews wrote:
 I started hitting a similar problem, and it seems to be related to 
 memory overhead and tasks getting OOM killed.  I filed a ticket
 here:
 
 https://issues.apache.org/jira/browse/SPARK-3535
 
 On Wed, Jul 16, 2014 at 5:27 AM, Ray Rodriguez
 rayrod2...@gmail.com mailto:rayrod2...@gmail.com wrote:
 
 I'll set some time aside today to gather and post some logs and 
 details about this issue from our end.
 
 
 On Wed, Jul 16, 2014 at 2:05 AM, Vinod Kone vinodk...@gmail.com 
 mailto:vinodk...@gmail.com wrote:
 
 
 
 
 On Tue, Jul 15, 2014 at 11:02 PM, Vinod Kone vi...@twitter.com 
 mailto:vi...@twitter.com wrote:
 
 
 On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh 
 gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no
 wrote:
 
 ERROR storage.BlockManagerMasterActor: Got two different block
 manager registrations on 201407031041-1227224054-5050-24004-0
 
 Googling about it seems that mesos is starting slaves at the same
 time and giving them the same id. So may bug in mesos ?
 
 
 Has this issue been resolved? We need more information to triage
 this. Maybe some logs that show the lifecycle of the duplicate
 instances?
 
 
 @vinodkone
 
 
 
 



Re: multi tenant setup

2014-08-01 Thread Gurvinder Singh
Hi Niklas,

I am using Apache spark with mesos 0.19.1. I have limited resources and
when I submit a job which takes all of the resources. This is fine as
when no one is using them, but when one of my colleagues submit his job,
I would like mesos allows some part of resources assigned to his job
when part of my jobs are finished, But it seems currently it waits until
my whole job is finished before starting the other job. Is it due to
mesos or you think Spark is the one who is blocking the job.

- Gurvinder
On 07/31/2014 06:12 PM, Niklas Nielsen wrote:
 Hi Gurvinder,
 
 The frameworks competing for resources will get their (weighted) fair
 share of the cluster. The allocator in the master uses the Dominant
 Resource Fairness algorithm to do this
 (http://static.usenix.org/event/nsdi11/tech/full_papers/Ghodsi.pdf).
 Regarding FIFO, are you referring to 'local' scheduler policies? How
 tasks are dispatched is up to the individual framework.
 
 Cheers,
 Niklas
 
 
 On 31 July 2014 07:28, Gurvinder Singh gurvinder.si...@uninett.no
 mailto:gurvinder.si...@uninett.no wrote:
 
 Hi,
 
 I am wondering how mesos handle the task scheduling when the resource
 are limited and multiple users want to access them at the same time. Is
 there any kind of fair scheduling as I see currently mainly FIFO. If
 there is how can I specify that.
 
 Thanks,
 Gurvinder
 
 



multi tenant setup

2014-07-31 Thread Gurvinder Singh
Hi,

I am wondering how mesos handle the task scheduling when the resource
are limited and multiple users want to access them at the same time. Is
there any kind of fair scheduling as I see currently mainly FIFO. If
there is how can I specify that.

Thanks,
Gurvinder


spark and mesos issue

2014-07-04 Thread Gurvinder Singh
We are getting this issue when we are running jobs with close to 1000
workers. Spark is from the github version and mesos is 0.19.0

ERROR storage.BlockManagerMasterActor: Got two different block manager
registrations on 201407031041-1227224054-5050-24004-0

Googling about it seems that mesos is starting slaves at the same time
and giving them the same id. So may bug in mesos ?

Thanks,
Gurvinder
On 07/04/2014 01:03 AM, Vinod Kone wrote:
 correct url:
 
 https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1
 
 
 On Thu, Jul 3, 2014 at 1:40 PM, Vinod Kone vinodk...@gmail.com
 mailto:vinodk...@gmail.com wrote:
 
 Hi,
 
 We are planning to release 0.19.1 (likely next week) which will be a
 bug fix release. Specifically, these are the fixes that we are
 planning to cherry pick.
 
 
 https://issues.apache.org/jira/issues/?filter=12326191jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1
 
 If there are other critical fixes that need to be backported to
 0.19.1 please reply here as soon as possible.
 
 Thanks,