The proctrack/cgroup plugin designed to more easily support pid's from processes launched outside of slurm. As I recall, the path used for cgroups is of this form: user_id/job_id/step_id/task_id
For example, PAM could put processes into the appropriate cgroup, at least to the level of the user_id directory, although I don't know how you handle binding tasks to cpus if slurm isn't launching the tasks. Quoting Michel Bourget <[email protected]>: > > Hi all, > > was the issue of monitoring pids coming-and-going-away addressed ( or > debated ) in > the past ( or the future tbd) in regards to proctrack and job_acct_gather ? > > I mean, since pids can fork() children and go away later, proctrack seems > not to able to dynamically track this since it's "on-demand". Same for > jobacct_gather since it's set "in stone" when a step is launched. > And, because proctrack is on-demand and jobacct_gather pids are set in stone > at the beginning, on-demand newly discovered pids never intersect > with those jobacct pids. > > Maybe an approach like using the kernel process socket connector, > based on an initial set of pids ( monitor fork() and exit() ), and then > proctrack/job_act_gather using that list instead, would be useful > and feasible ? In that case, I would think additional information > relative to the obtained pid list would be something in the lines of: > > pid_list_t { > a_lock; // Global list lock > int n; // # of records > pid_info_t *info; // Obvious > more ? > } > > pid_info_t { > a_lock; // Record lock > int is_active; // 0 means pids once live but now gone > struct jobacctinfo; // acct for that pid so far. > more ? > } > > Given the above, proctrack services would key on pid where active=1. > And jobacct_gather services would key on jobacctinfo gathered so far, > regardless of is_active.And I would risk to state proctrack and > jobacct_gather could be independent of each other, which is not the case > today, I believe. > > I have to admit the above would allow a lot more easily to inject > out-of-band pids to slurm. I can think of those using mpirun > in an salloc, or similar. "Similar" is about the sgimpi > implementation I maintain here at SGI. I understand it > sounds SGI-specific but I believe there is a generic value > in the above-mentioned approach that would benefit to SLURM in > general. > > Hopefully, I hope I am not off track ;-) > > Too evil ? Not worth ? Comments ? > > -- > > ----------------------------------------------------------- > Michel Bourget - SGI - Linux Software Engineering > "Past BIOS POST, everything else is extra" (travis) > ----------------------------------------------------------- >
