We run 100% on AWS and have been running Mesos in production since version 0.19
Our cluster consists of 3 dedicated zookeeper nodes (M3.2lx), 3 dedicated masters (M3.2lx), 8 dedicated slaves (M4.4xl) and 2 haproxy (M4.Medium) instances used in conjunction with marathon-lb for routing requests into backend services running on Mesos. We use Terraform a hashicorp tool for building the physical cluster nodes and Ansible for configuring Mesos, Chronos, and Marathon and Mesos-dns. For monitoring needs we leverage Datadog which has built in integration for tracking various stat in the cluster like CPU, Disk, Mem, Roles etc.. As of optimization we currently run two different workloads ELT (Spark/MR/Hadoop) and Scala based microservices. I've since started using different attributes to prevent my batch oriented jobs from consuming too many resources and at times blocking on my realtime microservices so instead of running all services across all nodes I use constraints on both Marathon and Chronos to fix this and basically partitioned my server into two groups. The only reason issue we ran into while running in AWS was sizing issues of our masters. Initially since I knew from the go I would use my masters as dedicated nodes I started with m3.medium which end up being way too small andwe would see issues with noisy neighbors % cpu steal was always high ~50% which would cause huge latency and timeouts between my masters, slaves and zookeeper. After replacing the m3 mediums with m4.2lx this issue has since went away. Let me know if you have any specifics. \--RB > On Jan 10 2016, at 2:27 am, lwq Adolph <kenan3...@gmail.com> wrote: > > Hi everyone: > > My future mesos cluster will be at least 100 nodes.So optimization of mesos is important.May you share your experience on using mesos in production environment.It can contain following topics: > > 1\. monitor tools of mesos cluster > > 2\. optimization of mesos parameters > > > > Thanks very much > > > > \-- > > Thanks & Best Regards > > 卢文泉 | Adolph Lu > > TEL:+86 15651006559 > > Linker Networks(<http://www.linkernetworks.com/>) -- *NOTICE TO RECIPIENTS*: This communication is confidential and intended for the use of the addressee only. If you are not an intended recipient of this communication, please delete it immediately and notify the sender by return email. Unauthorized reading, dissemination, distribution or copying of this communication is prohibited. This communication does not constitute an offer to sell or a solicitation of an indication of interest to purchase any loan, security or any other financial product or instrument, nor is it an offer to sell or a solicitation of an indication of interest to purchase any products or services to any persons who are prohibited from receiving such information under applicable law. The contents of this communication may not be accurate or complete and are subject to change without notice. As such, Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard") makes no representation regarding the accuracy or completeness of the information contained herein. The intended recipient is advised to consult its own professional advisors, including those specializing in legal, tax and accounting matters. Orchard does not provide legal, tax or accounting advice.