Hi Everyone,
I've been looking at SPANK plugins lately out of my own curiosity and am
wondering how one would go about writing to a job's *stdout* from a SPANK
plugin. I can write to stderr no problem because stderr of task0 gets dup'd
to fd 2 in _slurmd_job_log_init (at least, I think that's why).
My goal is to be able to write to stdout at the end of a batch script but
would like to solve the problem in a generic way so that all applicable
spank task contexts could access stdout of their associated job steps as
well as just stderr.
As best I can tell only the "slurm_spank_task_init" function has access to
the tasks' stdout.
I've come up with two approaches for making stdout() available to SPANK and
I'm curious as to which (if any) is preferable.
Approach #1 - Doing it all in SPANK with minimal changes to SLURM:
- Define a new "item" called "S_JOB_STDOUT_FD" or "S_STEP_STDOUT_FD" (I'm
not sure of the difference between a job and a step based on some of the
code)
- The implementation of that item would be something like this:
case S_JOB_STDOUT_FD:
p2int = va_arg(vargs, int *);
if (slurmd_job)
*p2int = slurmd_job->task[0]->stdout_fd;
else
*p2int = -1;
break;
Within the SPANK plugin that wanted to write to a job/step's stdout it
would dup() S_JOB_STDOUT_FD and store the value in a global variable. Then
in any of the task-related SPANK contexts one could write to the dup()'d fd
as referred to by the global variable and access standard out for that
particular step.
This would probably also require a new spank function called something like
spank_task_all_exit that would get called after all tasks have exited but
before _wait_for_io gets called. Inside the SPANK plugin in question this
function would close() the dup'd fd else in the case where a job/step's
stdout is a pipe a deadlock would occur while SLURM wait's for the pipe to
close.
Approach #2 - Doing it all in SLURM:
- Inside of _slurmd_job_log_init dup2() stdout_fd onto STDOUT_FILENO (this
is where stderr gets opened)
- Inside of io_close_all(), dup2() /dev/null onto STDOUT_FILENO to close
the dup'd FD. Without this the deadlock that happened above will also recur
when stdout is a pipe.
Using this approach one could just call printf() from within SPANK (in the
right context) and they would have access to the job's stdout.
Now that I've typed all this out, I'm not really a fan of Approach #1.
There are some implementation details that get exposed to SPANK that I
don't think SPANK should really have to worry about such as closing file
descriptors at the right time to avoid a deadlock. It also doesn't scale
well as each plugin that wanted to access stdout would need its own file
descriptor.
Any thoughts?
Best,
Aaron