[ https://issues.apache.org/jira/browse/OODT-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Imesha Sudasingha updated OODT-692: ----------------------------------- Fix Version/s: (was: 2.0) (was: 1.9) > Use lsof to stop Workflow/Resource Manager task/job PIDs > --------------------------------------------------------- > > Key: OODT-692 > URL: https://issues.apache.org/jira/browse/OODT-692 > Project: OODT > Issue Type: Bug > Components: pge wrapper framework, resource manager, workflow manager > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Priority: Major > Labels: killjob, manager, oodt, pid, resource, unix, workflow > > We can exploit a combination of LSOF, JobDir, and WorkflowInstanceId to > actually kill the process ID and fully stop a job kicked off by the resource > manager and workflow manager. I've been testing this process by hand on the > ASO process and it's totally useable by hand in practice, so we should > automate it. For example: > {noformat} > [snowdeploy@trango-private bin]$ lsof -p 37558 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > idl 37558 snowdeploy cwd DIR 253,2 4096 488284165 > /data/jobs/CASI/ISSP/20140511f1_184151_1399903013836 > .. > {noformat} > Reveals to use that the process ID 37558 (one of the IDL jobs running in ASO > for the ORTHO process) corresponds to _JobDir_ > {noformat} > /data/jobs/CASI/ISSP/20140511f1_184151_1399903013836 > {noformat} > We can also find out from WorklowInstanceMetadata that the _JobDir_ > corresponding to the line _184151_ is _726af17c-c131-4682-845e-4ef6b4a7eeee_. > So, from a Workflow Instance Id, we need: > # the resolved JobDir by CAS-PGE. If it's not a CAS-PGE job, we need the > WorkflowTask to specify a JobDir, or else this functionality will simply > print out a message saying Kill without JobDir not supported. > # a map for processes to interrogate with lsof e.g., PCS_JobKillProcessName > # the use of lsof to interrogate the PID table, find the job corresponding > JobDir, and then kill. If PCS_JobKillProcessName is not specified, then > interrogate all jobs to determine the job to kill. -- This message was sent by Atlassian Jira (v8.3.4#803005)