Support you --work_dir=/tmp/mesos. So you could

$ find /tmp/mesos -name $YOUR_EXECUTOR_ID

Then you could get a folder list and then could use lsof on them.

As a example, my executor id is "test" here.

$ find /tmp/mesos/ -name 'test'
/tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test

When I execute
lsof 
/tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/
(Keep in mind I append runs/latest) here.

Then you could see the pid list:

COMMAND     PID      USER   FD   TYPE DEVICE SIZE/OFF       NODE NAME
mesos-exe 21811 haosdent  cwd    DIR    8,3        6 3221463220
/tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11
sleep     21847 haosdent  cwd    DIR    8,3        6 3221463220
/tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11

Kill all of them.

On Thu, Apr 7, 2016 at 11:23 PM, June Taylor <[email protected]> wrote:

> I do have the executor ID. Can you advise how to kill it?
>
> I have one master and three slaves. Each slave has one of these orphans.
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Thu, Apr 7, 2016 at 10:14 AM, haosdent <[email protected]> wrote:
>
>> >Going to this slave I can find an executor within the mesos working
>> directory which matches this framework ID
>> The quickest way here is use kill in slave if you could find the
>> mesos-executor id. You make use lsof/fuser or dig log to find out the
>> executor pid.
>>
>> However, it still wired according your feedbacks. Do you have multiple
>> masters and fail over happens in your master? So that the slave could not
>> collect to the new master and tasks become orphan.
>>
>> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor <[email protected]> wrote:
>>
>>> Here is one of three orphaned tasks (first two octets of IP removed):
>>>
>>> "orphan_tasks": [
>>>         {
>>>             "executor_id": "",
>>>             "name": "Task 1",
>>>             "framework_id": "14cddded-e692-4838-9893-6e04a81481d8-0006",
>>>             "state": "TASK_RUNNING",
>>>             "statuses": [
>>>                 {
>>>                     "timestamp": 1459887295.05554,
>>>                     "state": "TASK_RUNNING",
>>>                     "container_status": {
>>>                         "network_infos": [
>>>                             {
>>>                                 "ip_addresses": [
>>>                                     {
>>>                                         "ip_address": "xxx.xxx.163.205"
>>>                                     }
>>>                                 ],
>>>                                 "ip_address": "xxx.xxx.163.205"
>>>                             }
>>>                         ]
>>>                     }
>>>                 }
>>>             ],
>>>             "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83",
>>>             "id": "1",
>>>             "resources": {
>>>                 "mem": 112640.0,
>>>                 "disk": 0.0,
>>>                 "cpus": 30.0
>>>             }
>>>         }
>>>
>>> Going to this slave I can find an executor within the mesos working
>>> directory which matches this framework ID. Reviewing the stdout messaging
>>> within indicates the program has finished its work. But, it is still
>>> holding these resources open.
>>>
>>> This framework ID is not shown as Active in the main Mesos Web UI, but
>>> does show up if you display the Slave's web UI.
>>>
>>> The resources consumed count towards the Idle pool, and have resulted in
>>> zero available resources for other Offers.
>>>
>>>
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>> On Thu, Apr 7, 2016 at 9:46 AM, haosdent <[email protected]> wrote:
>>>
>>>> > pyspark executors hanging around and consuming resources marked as
>>>> Idle in mesos Web UI
>>>>
>>>> Do you have some logs about this?
>>>>
>>>> >is there an API call I can make to kill these orphans?
>>>>
>>>> As I know, mesos agent would try to clean orphan containers when
>>>> restart. But I not sure the orphan I mean here is same with yours.
>>>>
>>>> On Thu, Apr 7, 2016 at 10:21 PM, June Taylor <[email protected]> wrote:
>>>>
>>>>> Greetings mesos users!
>>>>>
>>>>> I am debugging an issue with pyspark executors hanging around and
>>>>> consuming resources marked as Idle in mesos Web UI. These tasks also show
>>>>> up in the orphaned_tasks key in `mesos state`.
>>>>>
>>>>> I'm first wondering how to clear them out - is there an API call I can
>>>>> make to kill these orphans? Secondly, how it happened at all.
>>>>>
>>>>> Thanks,
>>>>> June Taylor
>>>>> System Administrator, Minnesota Population Center
>>>>> University of Minnesota
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang

Reply via email to