On 02/24/2012 01:24 PM, Danny Auble wrote:
> Michel are you using JobAcctGatherFrequency?  Most people set it to 30,
> but we have tested it down to 5 without noticing any real noise,
> obviously it is application dependent though.  Nothing should be set in
> stone.

Yes, I use a non-zero  JobAcctGatherFrequency. The problem is not there.

> Everything should be dynamic.  If you aren't polling with
> JobAcctGatherFrequency then the job_acct_gather plugin will only look at
> the beginning and at the end of the step (which is probably what you are
> seeing).
>

Of course, it polls but it polls a set-in-stone accounting tasks established
( well, at least in 2.3.3 iirc ) by jobacct_gather_g_add_task ( from 
_fork_all_tasks ).

In the case of jobacct_common_add_task() ( gather_linux ), this is set 
once.
The jobacct_gather_linux task_list will never *intersect* with new 
children pids
proctrack could possibly discover on its side.

( Just saw Moe's reply ... Lemme continue there )

PS: I should have mentioned about the specific jobacct_gather/linux
oriented question.



> Danny
>
> On 02/24/12 09:08, Michel Bourget wrote:
>> Hi all,
>>
>> was the issue of monitoring pids coming-and-going-away addressed ( or
>> debated ) in
>> the past ( or the future tbd) in regards to proctrack and job_acct_gather ?
>>
>> I mean, since pids can fork() children and go away later, proctrack seems
>> not to able to dynamically track this since it's "on-demand". Same for
>> jobacct_gather since it's set "in stone" when a step is launched.
>> And, because proctrack is on-demand and jobacct_gather pids are set in stone
>> at the beginning, on-demand newly discovered pids never intersect
>> with those jobacct pids.
>>
>> Maybe an approach like using the kernel process socket connector,
>> based on an initial set of pids ( monitor fork() and exit() ), and then
>> proctrack/job_act_gather using that list instead,  would be useful
>> and feasible ? In that case, I would think additional information
>> relative to  the obtained pid list would be something in the lines of:
>>
>>     pid_list_t {
>>            a_lock;             // Global list lock
>>            int n;              // # of records
>>            pid_info_t *info;   // Obvious
>>            more ?
>>     }
>>
>>     pid_info_t {
>>            a_lock;             // Record lock
>>            int is_active;      // 0 means pids once live but now gone
>>            struct jobacctinfo; // acct for that pid so far.
>>            more ?
>>     }
>>
>> Given the above, proctrack services would key on pid where active=1.
>> And jobacct_gather services would key on jobacctinfo gathered so far,
>> regardless of is_active.And I would risk to state proctrack and
>> jobacct_gather could be independent of each other, which is not the case
>> today, I believe.
>>
>> I have to admit the above would allow a lot more easily to inject
>> out-of-band pids to slurm. I can think of those using mpirun
>> in an salloc, or similar. "Similar" is about the sgimpi
>> implementation I maintain here at SGI.  I understand it
>> sounds SGI-specific but I believe there is a generic value
>> in the above-mentioned approach that would benefit to SLURM in
>> general.
>>
>> Hopefully, I hope I am not off track ;-)
>>
>> Too evil ? Not worth ? Comments ?
>>


-- 

-----------------------------------------------------------
      Michel Bourget - SGI - Linux Software Engineering
     "Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------

Reply via email to