Hi All,

I have a few jobs in my cluster that seem to have failed and I am
trying to figure out why.  Looks like for active jobs the "scontrol
show jobid #" gives me information about how the job was submitted but
I can't seem to get this type of information for a job that has
failed.  Trying to figure out what gres:gpu resources were requested
for specific jobs on a node.  Any help would be appreciated.

I have tried sacct but can't seem to find this information.  Also, is
there a easy way via scontrol or some other command to see what
resources are in use for a specific node?  For example I have 2 gpu's
per node but "scontrol show node nodename" does not report that actual
in use or allocated gpu's.  I just reports the total # of gpu's on
that node.

Thanks,
-J

Reply via email to