RE: Suddenly all tasks gone, framework at completed, cannot start framework
Is there a way to change this failover_timeout after the framework is running? Via the api or so? I see it is changed when the leader is changing. -Original Message- To: user Cc: cf.natali; janiszt Subject: RE: Suddenly all tasks gone, framework at completed, cannot start framework Thanks Tomek, Charles, I increased my MARATHON_FAILOVER_TIMEOUT from a day to a week. I almost cannot believe something happened yesterday that made everything go down today. However I have recently been testing with JAVA_OPTS to prevent oom's from the marathon tasks.
RE: Suddenly all tasks gone, framework at completed, cannot start framework
Thanks Tomek, Charles, I increased my MARATHON_FAILOVER_TIMEOUT from a day to a week. I almost cannot believe something happened yesterday that made everything go down today. However I have recently been testing with JAVA_OPTS to prevent oom's from the marathon tasks. -Original Message- From: Tomek Janiszewski [mailto:jani...@gmail.com] Sent: dinsdag 25 augustus 2020 16:55 To: user Subject: Re: Suddenly all tasks gone, framework at completed, cannot start framework See: https://stackoverflow.com/a/42544023/1387612 wt., 25 sie 2020 o 15:07 Marc Roos napisał(a): Today all my tasks are down and framework marathon is at completed. Any idea how this can happen? ed.cpp:520] Successfully authenticated with master master@192.168.10.151:5050 I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework has been removed'
Re: Suddenly all tasks gone, framework at completed, cannot start framework
See: https://stackoverflow.com/a/42544023/1387612 wt., 25 sie 2020 o 15:07 Marc Roos napisał(a): > > > Today all my tasks are down and framework marathon is at completed. Any > idea how this can happen? > > > > ed.cpp:520] Successfully authenticated with master > master@192.168.10.151:5050 > I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework has > been removed' > >
Re: Suddenly all tasks gone, framework at completed, cannot start framework -
Marc, Have you read https://mesos.readthedocs.io/en/1.1.0/high-availability-framework-guide/ in particular the section about the FrameworkInfo failover_timeout? Cheers, Charles On Tue, 25 Aug 2020, 16:01 Marc Roos, wrote: > > > > I assume this was because something happened with zookeeper, and it > restarted loading the wrong configuration file without the quorum=1. > Because I was testing with different zookeeper rpms (mesos rpm conf is > not standard location) > > Question: Is this by design that all tasks are terminated when zookeeper > is gone? Is there some timeout setting that allows tasks to run for a > day without zookeeper > > > > > > -Original Message- > To: user > Subject: Suddenly all tasks gone, framework at completed, cannot start > framework > > > > Today all my tasks are down and framework marathon is at completed. Any > idea how this can happen? > > > > ed.cpp:520] Successfully authenticated with master > master@192.168.10.151:5050 > I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework has > been removed' > > > >
RE: Suddenly all tasks gone, framework at completed, cannot start framework -
I assume this was because something happened with zookeeper, and it restarted loading the wrong configuration file without the quorum=1. Because I was testing with different zookeeper rpms (mesos rpm conf is not standard location) Question: Is this by design that all tasks are terminated when zookeeper is gone? Is there some timeout setting that allows tasks to run for a day without zookeeper -Original Message- To: user Subject: Suddenly all tasks gone, framework at completed, cannot start framework Today all my tasks are down and framework marathon is at completed. Any idea how this can happen? ed.cpp:520] Successfully authenticated with master master@192.168.10.151:5050 I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework has been removed'