Hi All, I have a few jobs in my cluster that seem to have failed and I am trying to figure out why. Looks like for active jobs the "scontrol show jobid #" gives me information about how the job was submitted but I can't seem to get this type of information for a job that has failed. Trying to figure out what gres:gpu resources were requested for specific jobs on a node. Any help would be appreciated.
I have tried sacct but can't seem to find this information. Also, is there a easy way via scontrol or some other command to see what resources are in use for a specific node? For example I have 2 gpu's per node but "scontrol show node nodename" does not report that actual in use or allocated gpu's. I just reports the total # of gpu's on that node. Thanks, -J
