On 02/24/2012 04:19 PM, Moe Jette wrote: > The proctrack/cgroup plugin designed to more easily support pid's from > processes launched outside of slurm. As I recall, the path used for > cgroups is of this form: > user_id/job_id/step_id/task_id >
That sounds interesting but that is true if you know the container id !! And mileage vary depending on the jobacct_gather plugin in use. For proctrack/linuxproc, OOB pids don't exist. Nevertheless, I found a way, which I call the *sentinels*, for sgimpi oob pids, an additional structure added to slurmd_task_info_t. Well, last statement is maybe not relevant to this thread yet. Maybe cgroup is more relevant than linuxproc in the presence of jobacct_gather/linux. At least, it looks like the cgroup flavor isn't using cont_id as the the only parent pid, hence allowing OOB pids. Sounds like I could more easily add the sentinel pids to the slurm cgroup set. > For example, PAM could put processes into the appropriate cgroup, at > least to the level of the user_id directory, although I don't know how > you handle binding tasks to cpus if slurm isn't launching the tasks. > I am working on a solution for UV SSI using the "sentinels* work I already did for launching. The real challenge is about associating a given SGIMPI ( OOB ) pid against its local SLURM taskid, hence its preset slurm cpuset and then cpuset_migrate( from libcpuset ) onto it between the fork/exec of the given PID, to minimize migration mess, weirdness and nightmares. I am very closed but not yet. For SGIMPI, we don't really care , at least for UV, but I'd like to be able to address the issue, ie. connecting the *dots* somehow. Since sgimpi launch one daemon per node for all tasks, what we do is *lumping* all the non-ltaskid=0 tasks+mems onto ltaskid=0 cpuset. Anyway, long story. A+ > Quoting Michel Bourget<[email protected]>: > >> Hi all, >> >> was the issue of monitoring pids coming-and-going-away addressed ( or >> debated ) in >> the past ( or the future tbd) in regards to proctrack and job_acct_gather ? >> >> I mean, since pids can fork() children and go away later, proctrack seems >> not to able to dynamically track this since it's "on-demand". Same for >> jobacct_gather since it's set "in stone" when a step is launched. >> And, because proctrack is on-demand and jobacct_gather pids are set in stone >> at the beginning, on-demand newly discovered pids never intersect >> with those jobacct pids. >> >> Maybe an approach like using the kernel process socket connector, >> based on an initial set of pids ( monitor fork() and exit() ), and then >> proctrack/job_act_gather using that list instead, would be useful >> and feasible ? In that case, I would think additional information >> relative to the obtained pid list would be something in the lines of: >> >> pid_list_t { >> a_lock; // Global list lock >> int n; // # of records >> pid_info_t *info; // Obvious >> more ? >> } >> >> pid_info_t { >> a_lock; // Record lock >> int is_active; // 0 means pids once live but now gone >> struct jobacctinfo; // acct for that pid so far. >> more ? >> } >> >> Given the above, proctrack services would key on pid where active=1. >> And jobacct_gather services would key on jobacctinfo gathered so far, >> regardless of is_active.And I would risk to state proctrack and >> jobacct_gather could be independent of each other, which is not the case >> today, I believe. >> >> I have to admit the above would allow a lot more easily to inject >> out-of-band pids to slurm. I can think of those using mpirun >> in an salloc, or similar. "Similar" is about the sgimpi >> implementation I maintain here at SGI. I understand it >> sounds SGI-specific but I believe there is a generic value >> in the above-mentioned approach that would benefit to SLURM in >> general. >> >> Hopefully, I hope I am not off track ;-) >> >> Too evil ? Not worth ? Comments ? >> >> -- >> >> ----------------------------------------------------------- >> Michel Bourget - SGI - Linux Software Engineering >> "Past BIOS POST, everything else is extra" (travis) >> ----------------------------------------------------------- >> -- ----------------------------------------------------------- Michel Bourget - SGI - Linux Software Engineering "Past BIOS POST, everything else is extra" (travis) -----------------------------------------------------------
