[ 
https://issues.apache.org/jira/browse/OODT-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imesha Sudasingha updated OODT-692:
-----------------------------------
    Fix Version/s:     (was: 2.0)
                       (was: 1.9)

> Use lsof to stop Workflow/Resource Manager task/job PIDs 
> ---------------------------------------------------------
>
>                 Key: OODT-692
>                 URL: https://issues.apache.org/jira/browse/OODT-692
>             Project: OODT
>          Issue Type: Bug
>          Components: pge wrapper framework, resource manager, workflow manager
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>            Priority: Major
>              Labels: killjob, manager, oodt, pid, resource, unix, workflow
>
> We can exploit a combination of LSOF, JobDir, and WorkflowInstanceId to 
> actually kill the process ID and fully stop a job kicked off by the resource 
> manager and workflow manager. I've been testing this process by hand on the 
> ASO process and it's totally useable by hand in practice, so we should 
> automate it. For example:
> {noformat}
> [snowdeploy@trango-private bin]$ lsof -p 37558
> COMMAND   PID       USER   FD   TYPE   DEVICE SIZE/OFF      NODE NAME
> idl     37558 snowdeploy  cwd    DIR    253,2     4096 488284165 
> /data/jobs/CASI/ISSP/20140511f1_184151_1399903013836
> ..
> {noformat}
> Reveals to use that the process ID 37558 (one of the IDL jobs running in ASO 
> for the ORTHO process) corresponds to _JobDir_ 
> {noformat}
> /data/jobs/CASI/ISSP/20140511f1_184151_1399903013836
> {noformat}
> We can also find out from WorklowInstanceMetadata that the _JobDir_ 
> corresponding to the line _184151_ is _726af17c-c131-4682-845e-4ef6b4a7eeee_.
> So, from a Workflow Instance Id, we need:
> # the resolved JobDir by CAS-PGE. If it's not a CAS-PGE job, we need the 
> WorkflowTask to specify a JobDir, or else this functionality will simply 
> print out a message saying Kill without JobDir not supported.
> # a map for processes to interrogate with lsof e.g., PCS_JobKillProcessName
> # the use of lsof to interrogate the PID table, find the job corresponding 
> JobDir, and then kill. If PCS_JobKillProcessName is not specified, then 
> interrogate all jobs to determine the job to kill.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to