Is there any warning/error message in marathon logs when it takes a long time to deploy/redeploy your micro service? Also worth take a look of the mesos slave logs.
On Tue, Feb 2, 2016 at 6:55 AM, Rodrick Brown <[email protected]> wrote: > My cluster consist of 9 slaves server split in 1/2 for two primary > applications (Spark | Scala Microservices) > > - Spark - (server 1,2,3,4,8) attributes: "rack:spark" > - Long running Microservices (server 5,6,7,9) attributes "rack:ms" > > > The spark jobs run in coarse mode and the majority of them are short lived > they run for about ~10-15 minutes via Chronos and shutdown. They start > every 15 minutes about ~45 jobs. > > We do lots of deploys daily mostly to the "rack:ms" nodes where these jobs > are started via Marathon and run until we need to deploy a new release of > code. > > Recently I started noticing jobs are taking forever to restart or startup > like they're not receiving valid offers. > The cluster resources consists of the following resources I always have > more than enough idle resources available to bring up/down new services yet > I've seen one scenario where a service took almost 10 minutes to restart. > > > CPUs Mem > Total 120 456.8 GB > Used 53.6 140.5 GB > Offered 0 0 B > Idle 66.4 316.3 GB > How can I combat this delay? I'm not using roles could this be the > problem? > Chronos jobs always seem to run fine but they require much less resource > than my long running Scala services. > Here is a sample job definition for in Marathon. > > { > "id": "production/index-service", > "cmd": "env && /opt/orchard/production/index-server/bin/run_jar.sh", > "cpus": 1.0, > "mem": 4096, > "disk": 1000, > "user": "orchard", > "instances": 2, > "constraints": [ > [ > "hostname","UNIQUE" > ], > [ > "rack", "LIKE", "ms" > ] > ], > "requirePorts": true, > "labels": { > "ENV": "production", > "HAPROXY_GROUP": "microservice" > }, > "ports": [ > 31703, > 31803, > 31903 > ], > "maxLaunchDelaySeconds": 3, > "backoffFactor": 1.20, > "healthChecks": [ > { > "gracePeriodSeconds": 3, > "intervalSeconds": 5, > "maxConsecutiveFailures": 3, > "protocol": "TCP", > "portIndex": 1, > "timeoutSeconds": 5 > } > ], > "upgradeStrategy": { > "minimumHealthCapacity": 0.5, > "maximumOverCapacity": 0.2 > } > } > > Any advice appreciated thanks. > > *NOTICE TO RECIPIENTS*: This communication is confidential and intended > for the use of the addressee only. If you are not an intended recipient of > this communication, please delete it immediately and notify the sender by > return email. Unauthorized reading, dissemination, distribution or copying > of this communication is prohibited. This communication does not constitute > an offer to sell or a solicitation of an indication of interest to purchase > any loan, security or any other financial product or instrument, nor is it > an offer to sell or a solicitation of an indication of interest to purchase > any products or services to any persons who are prohibited from receiving > such information under applicable law. The contents of this communication > may not be accurate or complete and are subject to change without notice. > As such, Orchard App, Inc. (including its subsidiaries and affiliates, > "Orchard") makes no representation regarding the accuracy or completeness > of the information contained herein. The intended recipient is advised to > consult its own professional advisors, including those specializing in > legal, tax and accounting matters. Orchard does not provide legal, tax or > accounting advice. >

