Hey, Could you detail on what you mean by "delays and health check problems"? Are you using your own framework or an existing one? How are you launching the tasks?
Could you share logs from Mesos that show timeouts to ZK? For reference, I operate a large Mesos cluster and I have never encountered problems when running 1k tasks concurrently so I think sharing data would help everyone debug this problem. On Fri, Dec 16, 2016 at 6:05 AM, Kiril Menshikov <[email protected]> wrote: > Hi, > > Does any body try to run Mesos on AWS instances? Can you give me > recommendations. > > I am developing elastic (scale aws instances on demand) Mesos cluster. > Currently I have 3 master instances. I run about 1000 tasks simultaneously. > I see delays and health check problems. > > ~400 tasks fits in one m4.10xlarge instance. (160GB RAM, 40 CPU). > > At the moment I increase time out in ZooKeeper cluster. What can I do to > decrease timeouts? > > Also how can I increase performance? The main bottleneck is what I have > the big amount of tasks(run simultaneously) for an hour after I shutdown > them or restart (depends how good them perform). > > -Kiril > > -- > Zameer Manji >

