Hello all, First and foremost since this is my first post to the list, I'd like to thank the Slurm developers for a great and gratis product!
Anyways, to the point. We have users submitting array jobs via sbatch and using "-a/--array=n-n" without an issue. When these jobs are running, we can use 'squeue' to see tasks under the form of "jobnumber_task". When we try to query these jobs via the accounting database (checking on job_table, step_table, and jobcomp_table) and via sacct -j "jobnumber", we're not getting the complete set of information associated with the job(batch and exec hosts, etc.). If the job is currently running, we can use scontrol to see the job and its steps, and the full set of information we're looking for. When I used scontrol to view an array job, I saw that "JobId" for each of the array tasks incremented based upon the step, e.g.: JobId=23383 ArrayJobId=23383 ArrayTaskId=1 JobId=23384 ArrayJobId=23383 ArrayTaskId=2 JobId=23385 ArrayJobId=23383 ArrayTaskId=3 When I tried to query any of the successive JobId's via sacct or the DB itself, I didn't get any information. Only the real JobId "23383" returned a result within sacct and the DB. I was able to glean node information from the scheduler and control daemon logs by looking for the JobId's listed above. I did find a previous post https://www.mail-archive.com/[email protected]/msg03344.html which seems to be my question as well. Thanks for any insight which can be provided, John DeSantis
