On 02/24/2012 04:19 PM, Moe Jette wrote:
> The proctrack/cgroup plugin designed to more easily support pid's from
> processes launched outside of slurm. As I recall, the path used for
> cgroups is of this form:
> user_id/job_id/step_id/task_id
>

That sounds interesting but that is true if you know the container id !!
And mileage vary depending on the jobacct_gather plugin in use.
For proctrack/linuxproc, OOB pids don't exist. Nevertheless, I found a way,
which I call the *sentinels*, for sgimpi oob pids, an additional
structure added to slurmd_task_info_t.

Well, last statement is maybe not relevant to this thread yet.
Maybe cgroup is more relevant than linuxproc in the presence
of jobacct_gather/linux. At least, it looks like the cgroup flavor
isn't using cont_id as the the only parent pid, hence allowing
OOB pids. Sounds like I could more easily add the sentinel pids
to the slurm cgroup set.


> For example, PAM could put processes into the appropriate cgroup, at
> least to the level of the user_id directory, although I don't know how
> you handle binding tasks to cpus if slurm isn't launching the tasks.
>

I am working on a solution for UV SSI using the "sentinels* work
I already did for launching. The real challenge is about associating
a given SGIMPI ( OOB )  pid against its local SLURM  taskid, hence
its preset slurm cpuset and  then cpuset_migrate( from libcpuset ) onto it
between the fork/exec of the given PID, to minimize migration
mess, weirdness and nightmares. I am very closed but not yet.

For SGIMPI, we don't really care , at least for UV, but I'd like to be able
to address the issue, ie. connecting the *dots* somehow. Since sgimpi
launch one daemon per node for all tasks, what we do is *lumping* all
the non-ltaskid=0 tasks+mems onto ltaskid=0 cpuset. Anyway, long story.

A+


> Quoting Michel Bourget<[email protected]>:
>
>> Hi all,
>>
>> was the issue of monitoring pids coming-and-going-away addressed ( or
>> debated ) in
>> the past ( or the future tbd) in regards to proctrack and job_acct_gather ?
>>
>> I mean, since pids can fork() children and go away later, proctrack seems
>> not to able to dynamically track this since it's "on-demand". Same for
>> jobacct_gather since it's set "in stone" when a step is launched.
>> And, because proctrack is on-demand and jobacct_gather pids are set in stone
>> at the beginning, on-demand newly discovered pids never intersect
>> with those jobacct pids.
>>
>> Maybe an approach like using the kernel process socket connector,
>> based on an initial set of pids ( monitor fork() and exit() ), and then
>> proctrack/job_act_gather using that list instead,  would be useful
>> and feasible ? In that case, I would think additional information
>> relative to  the obtained pid list would be something in the lines of:
>>
>>    pid_list_t {
>>           a_lock;             // Global list lock
>>           int n;              // # of records
>>           pid_info_t *info;   // Obvious
>>           more ?
>>    }
>>
>>    pid_info_t {
>>           a_lock;             // Record lock
>>           int is_active;      // 0 means pids once live but now gone
>>           struct jobacctinfo; // acct for that pid so far.
>>           more ?
>>    }
>>
>> Given the above, proctrack services would key on pid where active=1.
>> And jobacct_gather services would key on jobacctinfo gathered so far,
>> regardless of is_active.And I would risk to state proctrack and
>> jobacct_gather could be independent of each other, which is not the case
>> today, I believe.
>>
>> I have to admit the above would allow a lot more easily to inject
>> out-of-band pids to slurm. I can think of those using mpirun
>> in an salloc, or similar. "Similar" is about the sgimpi
>> implementation I maintain here at SGI.  I understand it
>> sounds SGI-specific but I believe there is a generic value
>> in the above-mentioned approach that would benefit to SLURM in
>> general.
>>
>> Hopefully, I hope I am not off track ;-)
>>
>> Too evil ? Not worth ? Comments ?
>>
>> --
>>
>> -----------------------------------------------------------
>>        Michel Bourget - SGI - Linux Software Engineering
>>       "Past BIOS POST, everything else is extra" (travis)
>> -----------------------------------------------------------
>>


-- 

-----------------------------------------------------------
      Michel Bourget - SGI - Linux Software Engineering
     "Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------

Reply via email to