I suspect that after your maintenance operation, Marathon may have
registered with a new frameworkId and launched is own copies of your tasks
(why you see double). However, the old Marathon frameworkId probably has a
failover_timeout of a week, so it will continue to be considered
"registered", but "disconnected".
Check the /frameworks endpoint to see if Mesos thinks you have two
Marathons registered.
If so, you can use the /teardown endpoint to unregister the old one, which
will cause all of its tasks to be killed.

On Wed, Mar 30, 2016 at 4:56 AM, Alberto del Barrio <
[email protected]> wrote:

> Hi haosdent,
>
> thanks for your reply. It is actually very weird, first time I see this
> situation in around one year using mesos.
> I am pasting here the truncate output you asked for. It is showing one of
> the tasks with "Failed" state under "Active tasks":
>
> {
>                     "executor_id": "",
>                     "framework_id":
> "c857c625-25dc-4650-89b8-de4b597026ed-0000",
>                     "id": "pixie.33f85e8f-f03b-11e5-af6c-fa6389efeef1",
>                     "labels": [
>                        ......................
>                     ],
>                     "name": "myTask",
>                     "resources": {
>                         "cpus": 4.0,
>                         "disk": 0,
>                         "mem": 2560.0,
>                         "ports": "[31679-31679]"
>                     },
>                     "slave_id":
> "c857c625-25dc-4650-89b8-de4b597026ed-S878",
>                     "state": "TASK_FAILED",
>                     "statuses": [
>                         {
>                             "container_status": {
>                                 "network_infos": [
>                                     {
>                                         "ip_address": "10.XX.XX.XX"
>                                     }
>                                 ]
>                             },
>                             "state": "TASK_RUNNING",
>                             "timestamp": 1458657321.16671
>                         },
>                         {
>                             "container_status": {
>                                 "network_infos": [
>                                     {
>                                         "ip_address": "10.XX.XX.XX"
>                                     }
>                                 ]
>                             },
>                             "state": "TASK_FAILED",
>                             "timestamp": 1459329310.13663
>                         }
>                     ]
>                 },
>
>
> t
>
>
> On 03/30/16 13:30, haosdent wrote:
>
> >"Active tasks" with status "Failed"
> A bit wired here. According to my test, it should exists in "Completed
> Tasks". If possible, could you show you /master/state endpoint result. I
> think the frameworks node in state response would be helpful to analyze the
> problem.
>
> On Wed, Mar 30, 2016 at 6:26 PM, Alberto del Barrio <
> <[email protected]>[email protected]>
> wrote:
>
>> Hi all,
>>
>> after a maintenance carried on in a mesos cluster (0.25) using marathon
>> (0.10) as a only scheduler , I've finished with the double of tasks for
>> each application. But marathon was recognizing only half of them.
>> For getting rid of this orphaned tasks, I've did a "kill PID" of them, so
>> they free up their resources.
>>
>> My problem now is that these tasks I've killed, are still appearing in
>> the mesos UI under "Active tasks" with status "Failed". This is not
>> affecting my system, but I would like to clean them up.
>> Googling I can't find anything.
>> Can someone point me to a solution for cleaning those tasks?
>>
>> Cheers,
>> Alberto.
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>
>

Reply via email to