RE: Suddenly all tasks gone, framework at completed, cannot start framework

2020-11-11 Thread Marc Roos


Is there a way to change this failover_timeout after the framework is 
running? Via the api or so? I see it is changed when the leader is 
changing.


-Original Message-
To: user
Cc: cf.natali; janiszt
Subject: RE: Suddenly all tasks gone, framework at completed, cannot 
start framework


Thanks Tomek, Charles, I increased my MARATHON_FAILOVER_TIMEOUT from a 
day to a week. I almost cannot believe something happened yesterday that 
made everything go down today. However I have recently been testing with 
JAVA_OPTS to prevent oom's from the marathon tasks.





RE: Suddenly all tasks gone, framework at completed, cannot start framework

2020-08-25 Thread Marc Roos


Thanks Tomek, Charles, I increased my MARATHON_FAILOVER_TIMEOUT from a 
day to a week. I almost cannot believe something happened yesterday that 
made everything go down today. However I have recently been testing with 
JAVA_OPTS to prevent oom's from the marathon tasks.




-Original Message-
From: Tomek Janiszewski [mailto:jani...@gmail.com] 
Sent: dinsdag 25 augustus 2020 16:55
To: user
Subject: Re: Suddenly all tasks gone, framework at completed, cannot 
start framework

See: https://stackoverflow.com/a/42544023/1387612

wt., 25 sie 2020 o 15:07 Marc Roos  
napisał(a):




Today all my tasks are down and framework marathon is at completed. 
Any 
idea how this can happen?



ed.cpp:520] Successfully authenticated with master 
master@192.168.10.151:5050
I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework 
has 
been removed'






Re: Suddenly all tasks gone, framework at completed, cannot start framework

2020-08-25 Thread Tomek Janiszewski
See: https://stackoverflow.com/a/42544023/1387612

wt., 25 sie 2020 o 15:07 Marc Roos  napisał(a):

>
>
> Today all my tasks are down and framework marathon is at completed. Any
> idea how this can happen?
>
>
>
> ed.cpp:520] Successfully authenticated with master
> master@192.168.10.151:5050
> I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework has
> been removed'
>
>


Re: Suddenly all tasks gone, framework at completed, cannot start framework -

2020-08-25 Thread Charles-François Natali
Marc,

Have you read
https://mesos.readthedocs.io/en/1.1.0/high-availability-framework-guide/ in
particular the section about the FrameworkInfo failover_timeout?

Cheers,

Charles



On Tue, 25 Aug 2020, 16:01 Marc Roos,  wrote:

>
>
>
> I assume this was because something happened with zookeeper, and it
> restarted loading the wrong configuration file without the quorum=1.
> Because I was testing with different zookeeper rpms (mesos rpm conf is
> not standard location)
>
> Question: Is this by design that all tasks are terminated when zookeeper
> is gone? Is there some timeout setting that allows tasks to run for a
> day without zookeeper
>
>
>
>
>
> -Original Message-
> To: user
> Subject: Suddenly all tasks gone, framework at completed, cannot start
> framework
>
>
>
> Today all my tasks are down and framework marathon is at completed. Any
> idea how this can happen?
>
>
>
> ed.cpp:520] Successfully authenticated with master
> master@192.168.10.151:5050
> I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework has
> been removed'
>
>
>
>


RE: Suddenly all tasks gone, framework at completed, cannot start framework -

2020-08-25 Thread Marc Roos




I assume this was because something happened with zookeeper, and it 
restarted loading the wrong configuration file without the quorum=1. 
Because I was testing with different zookeeper rpms (mesos rpm conf is 
not standard location)

Question: Is this by design that all tasks are terminated when zookeeper 
is gone? Is there some timeout setting that allows tasks to run for a 
day without zookeeper





-Original Message-
To: user
Subject: Suddenly all tasks gone, framework at completed, cannot start 
framework



Today all my tasks are down and framework marathon is at completed. Any 
idea how this can happen?



ed.cpp:520] Successfully authenticated with master 
master@192.168.10.151:5050
I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework has 
been removed'