Support you --work_dir=/tmp/mesos. So you could $ find /tmp/mesos -name $YOUR_EXECUTOR_ID
Then you could get a folder list and then could use lsof on them. As a example, my executor id is "test" here. $ find /tmp/mesos/ -name 'test' /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test When I execute lsof /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/ (Keep in mind I append runs/latest) here. Then you could see the pid list: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mesos-exe 21811 haosdent cwd DIR 8,3 6 3221463220 /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11 sleep 21847 haosdent cwd DIR 8,3 6 3221463220 /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11 Kill all of them. On Thu, Apr 7, 2016 at 11:23 PM, June Taylor <[email protected]> wrote: > I do have the executor ID. Can you advise how to kill it? > > I have one master and three slaves. Each slave has one of these orphans. > > > Thanks, > June Taylor > System Administrator, Minnesota Population Center > University of Minnesota > > On Thu, Apr 7, 2016 at 10:14 AM, haosdent <[email protected]> wrote: > >> >Going to this slave I can find an executor within the mesos working >> directory which matches this framework ID >> The quickest way here is use kill in slave if you could find the >> mesos-executor id. You make use lsof/fuser or dig log to find out the >> executor pid. >> >> However, it still wired according your feedbacks. Do you have multiple >> masters and fail over happens in your master? So that the slave could not >> collect to the new master and tasks become orphan. >> >> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor <[email protected]> wrote: >> >>> Here is one of three orphaned tasks (first two octets of IP removed): >>> >>> "orphan_tasks": [ >>> { >>> "executor_id": "", >>> "name": "Task 1", >>> "framework_id": "14cddded-e692-4838-9893-6e04a81481d8-0006", >>> "state": "TASK_RUNNING", >>> "statuses": [ >>> { >>> "timestamp": 1459887295.05554, >>> "state": "TASK_RUNNING", >>> "container_status": { >>> "network_infos": [ >>> { >>> "ip_addresses": [ >>> { >>> "ip_address": "xxx.xxx.163.205" >>> } >>> ], >>> "ip_address": "xxx.xxx.163.205" >>> } >>> ] >>> } >>> } >>> ], >>> "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83", >>> "id": "1", >>> "resources": { >>> "mem": 112640.0, >>> "disk": 0.0, >>> "cpus": 30.0 >>> } >>> } >>> >>> Going to this slave I can find an executor within the mesos working >>> directory which matches this framework ID. Reviewing the stdout messaging >>> within indicates the program has finished its work. But, it is still >>> holding these resources open. >>> >>> This framework ID is not shown as Active in the main Mesos Web UI, but >>> does show up if you display the Slave's web UI. >>> >>> The resources consumed count towards the Idle pool, and have resulted in >>> zero available resources for other Offers. >>> >>> >>> >>> Thanks, >>> June Taylor >>> System Administrator, Minnesota Population Center >>> University of Minnesota >>> >>> On Thu, Apr 7, 2016 at 9:46 AM, haosdent <[email protected]> wrote: >>> >>>> > pyspark executors hanging around and consuming resources marked as >>>> Idle in mesos Web UI >>>> >>>> Do you have some logs about this? >>>> >>>> >is there an API call I can make to kill these orphans? >>>> >>>> As I know, mesos agent would try to clean orphan containers when >>>> restart. But I not sure the orphan I mean here is same with yours. >>>> >>>> On Thu, Apr 7, 2016 at 10:21 PM, June Taylor <[email protected]> wrote: >>>> >>>>> Greetings mesos users! >>>>> >>>>> I am debugging an issue with pyspark executors hanging around and >>>>> consuming resources marked as Idle in mesos Web UI. These tasks also show >>>>> up in the orphaned_tasks key in `mesos state`. >>>>> >>>>> I'm first wondering how to clear them out - is there an API call I can >>>>> make to kill these orphans? Secondly, how it happened at all. >>>>> >>>>> Thanks, >>>>> June Taylor >>>>> System Administrator, Minnesota Population Center >>>>> University of Minnesota >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang

