Thanks for reporting! Can you be more specific about which component crashes a lot? Is it the framework, the master, the agent, or the executor. As Artem and Vinod mentioned, it'll be really helpful if you can provide the relevant log (master/agent/executor's log) so that we can pinpoint the issue.
- Jie On Thu, Mar 17, 2016 at 1:45 AM, Guillermo Rodriguez <gu...@spritekin.com> wrote: > Update to 0.27.2 or wait for 0.28.0. > > I experienced many crashes as well with 0.27.1 due to crashes in the > frameworks bringing down the whole cluster (swarm specially). Also problems > in the resource precision that also crashed the servers and crashes when > nodes disconnected. > > I really found 0.27 very unstable. > > Many of this problems were solved for 0.27.2 and my latest environment has > proven way more stable. It is still not fully stable as the cluster crashed > yesterday due to a crash in marathon, but way better overall and quick to > recover. > > Luck! > Guimo > > > ------------------------------ > *From*: "Klaus Ma" <klaus1982...@gmail.com> > *Sent*: Thursday, March 17, 2016 1:36 PM > *To*: user@mesos.apache.org > *Cc*: "Gabriel Menegatti" <gabr...@simbioseventures.com> > *Subject*: Re: Unstability on Mesos 0.27 > > If Mesos daemon crashed, I'd suggest to log a JIRA and append more detail, > e.g. steps, master/agent log. > > ---- > Da (Klaus), Ma (??) | PMPĀ® | Advisory Software Engineer > Platform OpenSource Technology, STG, IBM GCG > +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me > > On Thu, Mar 17, 2016 at 8:26 AM, Vinod Kone <vinodk...@apache.org> wrote: >> >> Hey Gabriel, >> >> Could you share more details on what the crashes are and what your setup >> is (docker containerizer?). Any logs (master, agent, application) that can >> shed light would be useful to diagnose. >> >> On Wed, Mar 16, 2016 at 5:12 PM, Alfredo Carneiro < >> alfr...@simbioseventures.com> wrote: >>> >>> Hello guys, >>> >>> I am using Mesos 0.27 with different kinds of applications, such as, >>> crawlers, databases and websites. However, I have faced many crashes and I >>> couldn't find what it is the matter. >>> >>> We have 14 machines with 8Gb of ram and 4 cpu each. Usually, we run >>> about 40 instance of our crawler, which they start stopping of nowhere (but >>> the containers keep running). The day before yesterday we decided try to >>> test our entire infrastrcuture and we scaled our crawler up to 110 >>> instances. Unfortunately, today we've faced a big crash that affected >>> mainly our crawler and our databases. >>> >>> So, I am wondering if anyone else have the same problem, such as apps >>> which crashes of nowhere or something else which could be related to some >>> unstability on Mesos. >>> >>> -- >>> Alfredo Miranda >>> >>> >>