Hmm, sorry for didn't express my idea clear. I mean kill those orphan tasks here.
On Thu, Apr 7, 2016 at 11:57 PM, June Taylor <[email protected]> wrote: > Forgive my ignorance, are you literally saying I should just sigkill these > instances? How will that clean up the mesos orphans? > > > Thanks, > June Taylor > System Administrator, Minnesota Population Center > University of Minnesota > > On Thu, Apr 7, 2016 at 10:44 AM, haosdent <[email protected]> wrote: > >> Support you --work_dir=/tmp/mesos. So you could >> >> $ find /tmp/mesos -name $YOUR_EXECUTOR_ID >> >> Then you could get a folder list and then could use lsof on them. >> >> As a example, my executor id is "test" here. >> >> $ find /tmp/mesos/ -name 'test' >> >> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test >> >> When I execute >> lsof >> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0002/executors/test/runs/latest/ >> (Keep in mind I append runs/latest) here. >> >> Then you could see the pid list: >> >> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >> mesos-exe 21811 haosdent cwd DIR 8,3 6 3221463220 >> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11 >> sleep 21847 haosdent cwd DIR 8,3 6 3221463220 >> /tmp/mesos/0/slaves/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-S0/frameworks/138ee255-c8ef-4caa-8ff2-c0c02f70b4f5-0003/executors/test/runs/efecb119-1019-4629-91ab-fec7724a0f11 >> >> Kill all of them. >> >> On Thu, Apr 7, 2016 at 11:23 PM, June Taylor <[email protected]> wrote: >> >>> I do have the executor ID. Can you advise how to kill it? >>> >>> I have one master and three slaves. Each slave has one of these orphans. >>> >>> >>> Thanks, >>> June Taylor >>> System Administrator, Minnesota Population Center >>> University of Minnesota >>> >>> On Thu, Apr 7, 2016 at 10:14 AM, haosdent <[email protected]> wrote: >>> >>>> >Going to this slave I can find an executor within the mesos working >>>> directory which matches this framework ID >>>> The quickest way here is use kill in slave if you could find the >>>> mesos-executor id. You make use lsof/fuser or dig log to find out the >>>> executor pid. >>>> >>>> However, it still wired according your feedbacks. Do you have multiple >>>> masters and fail over happens in your master? So that the slave could not >>>> collect to the new master and tasks become orphan. >>>> >>>> On Thu, Apr 7, 2016 at 11:06 PM, June Taylor <[email protected]> wrote: >>>> >>>>> Here is one of three orphaned tasks (first two octets of IP removed): >>>>> >>>>> "orphan_tasks": [ >>>>> { >>>>> "executor_id": "", >>>>> "name": "Task 1", >>>>> "framework_id": >>>>> "14cddded-e692-4838-9893-6e04a81481d8-0006", >>>>> "state": "TASK_RUNNING", >>>>> "statuses": [ >>>>> { >>>>> "timestamp": 1459887295.05554, >>>>> "state": "TASK_RUNNING", >>>>> "container_status": { >>>>> "network_infos": [ >>>>> { >>>>> "ip_addresses": [ >>>>> { >>>>> "ip_address": "xxx.xxx.163.205" >>>>> } >>>>> ], >>>>> "ip_address": "xxx.xxx.163.205" >>>>> } >>>>> ] >>>>> } >>>>> } >>>>> ], >>>>> "slave_id": "182cf09f-0843-4736-82f1-d913089d7df4-S83", >>>>> "id": "1", >>>>> "resources": { >>>>> "mem": 112640.0, >>>>> "disk": 0.0, >>>>> "cpus": 30.0 >>>>> } >>>>> } >>>>> >>>>> Going to this slave I can find an executor within the mesos working >>>>> directory which matches this framework ID. Reviewing the stdout messaging >>>>> within indicates the program has finished its work. But, it is still >>>>> holding these resources open. >>>>> >>>>> This framework ID is not shown as Active in the main Mesos Web UI, but >>>>> does show up if you display the Slave's web UI. >>>>> >>>>> The resources consumed count towards the Idle pool, and have resulted >>>>> in zero available resources for other Offers. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> June Taylor >>>>> System Administrator, Minnesota Population Center >>>>> University of Minnesota >>>>> >>>>> On Thu, Apr 7, 2016 at 9:46 AM, haosdent <[email protected]> wrote: >>>>> >>>>>> > pyspark executors hanging around and consuming resources marked as >>>>>> Idle in mesos Web UI >>>>>> >>>>>> Do you have some logs about this? >>>>>> >>>>>> >is there an API call I can make to kill these orphans? >>>>>> >>>>>> As I know, mesos agent would try to clean orphan containers when >>>>>> restart. But I not sure the orphan I mean here is same with yours. >>>>>> >>>>>> On Thu, Apr 7, 2016 at 10:21 PM, June Taylor <[email protected]> wrote: >>>>>> >>>>>>> Greetings mesos users! >>>>>>> >>>>>>> I am debugging an issue with pyspark executors hanging around and >>>>>>> consuming resources marked as Idle in mesos Web UI. These tasks also >>>>>>> show >>>>>>> up in the orphaned_tasks key in `mesos state`. >>>>>>> >>>>>>> I'm first wondering how to clear them out - is there an API call I >>>>>>> can make to kill these orphans? Secondly, how it happened at all. >>>>>>> >>>>>>> Thanks, >>>>>>> June Taylor >>>>>>> System Administrator, Minnesota Population Center >>>>>>> University of Minnesota >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Haosdent Huang >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang

