I suspect that after your maintenance operation, Marathon may have registered with a new frameworkId and launched is own copies of your tasks (why you see double). However, the old Marathon frameworkId probably has a failover_timeout of a week, so it will continue to be considered "registered", but "disconnected". Check the /frameworks endpoint to see if Mesos thinks you have two Marathons registered. If so, you can use the /teardown endpoint to unregister the old one, which will cause all of its tasks to be killed.
On Wed, Mar 30, 2016 at 4:56 AM, Alberto del Barrio < [email protected]> wrote: > Hi haosdent, > > thanks for your reply. It is actually very weird, first time I see this > situation in around one year using mesos. > I am pasting here the truncate output you asked for. It is showing one of > the tasks with "Failed" state under "Active tasks": > > { > "executor_id": "", > "framework_id": > "c857c625-25dc-4650-89b8-de4b597026ed-0000", > "id": "pixie.33f85e8f-f03b-11e5-af6c-fa6389efeef1", > "labels": [ > ...................... > ], > "name": "myTask", > "resources": { > "cpus": 4.0, > "disk": 0, > "mem": 2560.0, > "ports": "[31679-31679]" > }, > "slave_id": > "c857c625-25dc-4650-89b8-de4b597026ed-S878", > "state": "TASK_FAILED", > "statuses": [ > { > "container_status": { > "network_infos": [ > { > "ip_address": "10.XX.XX.XX" > } > ] > }, > "state": "TASK_RUNNING", > "timestamp": 1458657321.16671 > }, > { > "container_status": { > "network_infos": [ > { > "ip_address": "10.XX.XX.XX" > } > ] > }, > "state": "TASK_FAILED", > "timestamp": 1459329310.13663 > } > ] > }, > > > t > > > On 03/30/16 13:30, haosdent wrote: > > >"Active tasks" with status "Failed" > A bit wired here. According to my test, it should exists in "Completed > Tasks". If possible, could you show you /master/state endpoint result. I > think the frameworks node in state response would be helpful to analyze the > problem. > > On Wed, Mar 30, 2016 at 6:26 PM, Alberto del Barrio < > <[email protected]>[email protected]> > wrote: > >> Hi all, >> >> after a maintenance carried on in a mesos cluster (0.25) using marathon >> (0.10) as a only scheduler , I've finished with the double of tasks for >> each application. But marathon was recognizing only half of them. >> For getting rid of this orphaned tasks, I've did a "kill PID" of them, so >> they free up their resources. >> >> My problem now is that these tasks I've killed, are still appearing in >> the mesos UI under "Active tasks" with status "Failed". This is not >> affecting my system, but I would like to clean them up. >> Googling I can't find anything. >> Can someone point me to a solution for cleaning those tasks? >> >> Cheers, >> Alberto. >> > > > > -- > Best Regards, > Haosdent Huang > > >

