John, this was fixed in 14.11 (commit d23590dbc94e40a0963fc8d1cee0e6145f782f5c). Since structures had to change it wasn't possible to fix previous versions. The patch might go in cleanly to 14.03, but will probably need some massaging with the packs and unpacks. Using this patch will also break backwards compatibility which you may or may not care about.

Danny

On 09/26/2014 10:19 AM, John Desantis wrote:
Hello all,

First and foremost since this is my first post to the list, I'd like
to thank the Slurm developers for a great and gratis product!

Anyways, to the point.

We have users submitting array jobs via sbatch and using
"-a/--array=n-n" without an issue.  When these jobs are running, we
can use 'squeue' to see tasks under the form of "jobnumber_task".
When we try to query these jobs via the accounting database (checking
on job_table, step_table, and jobcomp_table) and via sacct -j
"jobnumber", we're not getting the complete set of information
associated with the job(batch and exec hosts, etc.).  If the job is
currently running, we can use scontrol to see the job and its steps,
and the full set of information we're looking for.

When I used scontrol to view an array job, I saw that "JobId" for each
of the array tasks incremented based upon the step, e.g.:

JobId=23383 ArrayJobId=23383 ArrayTaskId=1
JobId=23384 ArrayJobId=23383 ArrayTaskId=2
JobId=23385 ArrayJobId=23383 ArrayTaskId=3

When I tried to query any of the successive JobId's via sacct or the
DB itself, I didn't get any information.  Only the real JobId "23383"
returned a result within sacct and the DB.  I was able to glean node
information from the scheduler and control daemon logs by looking for
the JobId's listed above.

I did find a previous post
https://www.mail-archive.com/[email protected]/msg03344.html which
seems to be my question as well.

Thanks for any insight which can be provided,

John DeSantis

Reply via email to